Notes of QCon Shanghai 2017

General thoughts

If a simple algorithm can solve the problem, do we need a more complex algorithm to implement it?
Most companies use multiple models to solve a single problem.
Similar to software development, algorithm development needs to iterate as well: model update and feature engineering.
When the scale is growing, we may need build a platform to support more algorithm developments.
The evaluation of online inference may be different from the one of offline testing. We need align two evaluations.
Split online training and offline training
Do not use micro-service for micro-service. Implement with different gratitude.

Sessions

Session 1 by Prof. Hui WEI from Fudan

Theme: AI

Takeaway: We need treat AI carefully.

Limited definitions of of AI:

based on code
can be defined literally
can be defined by algorithms
can be computed

Why can Go game be solved by algorithms?

clean data
limited rules
huge amount of tagged data

Unsolved question: how does knowledge guide our action without programmed routines?

Possible direction: a light-weight solution to do inference

Readings:

David Kirsh, Foundations of AI: The Big Issues. Artificial Intelligence. (1991). http://adrenaline.ucsd.edu/kirsh/Articles/BigIssues/big-issues.pdf

Session 2 by Zaiqing NIE from Alibaba

Theme: AliGene

Takeaway: An easy-to-use NLP platform is a new trend.

An HCI platform based on NLP

Request > Intent detection > Slot filling (knowledge base, user profiling) > dialogue management (predefined) > Response

Challenges:

Ambiguity of natural language
limited tagged data
high expectation of precision
limited NLP developers

Solutions:

Cold start < rules Warm start < deep learning Reinforcement learning Active learning (Use questions to increase certainty)

Readings:

Agichtein, E., & Gravano, L. (2000, June). Snowball: Extracting relations from large plain-text collections. In Proceedings of the fifth ACM conference on Digital libraries (pp. 85-94). ACM. http://www.mathcs.emory.edu/%7Eeugene/papers/dl00.pdf

Session 3 by Hui Wang from Paypal

Theme: Risk Management by Using AI

Takeaway: A single problem might need multiple models and rules

Risk management has no dimension limit, no volume limit for AI.

Data sources:

account behaviors
sessions (visiting histoy)
negative trading history
operational data
external data

Detection order:

transactions
account level data
multi account level (related accounts)

Story-based approach to decrease false positives

Run multiple models and rules before making decisions on transactions - all within milliseconds.

Session 4 by Xiongjie LIAO from TalkingData

Theme: Monitoring Micro-service

Takeaway: Monitoring only the key metrics is be more effective and efficient.

Micro-service split-up:

horizontal
vertical
hybrid

Problems of micro-service:

deployment management
complicated workflow
complicated monitoring

Approached of monitoring:

focusing on performance (lightweight)
focusing on workflow (currently main stream)

Philosophy of TalkingData’s solution:

performance monitoring first
only monitor possible bottlenecks (events VS workflows)
locate the specific code caused bottlenecks

Key metrics to monitor:

API response time
throughput
API response components
network IO
API workflow time (can use dubbo to monitor)
slow request stacks

Session 6 by Yuefeng ZHOU from Google Brain

Theme: Tensorflow as a Computing Framework

Takeaway: We can use machine learning to optimize machine model / algorithms/ computing.

New features:

Eager Execution
2nd TPU
Tensorflow APIs (Keras 2.0)
Estimator
Tensorflow Lite 1. on mobile device 2. offline inference 3. APM-optimised
New Input Pipeline 1. input pipeline = lazy lists 2. map + filter 3. tf.data
1. Dataset > functional programming
2. Iterator > sequential input

Learn2learn algorithms for selecting the best models

based on RNN
evolution algorithms 1. select the best as parent 2. pick two randomly 3. drop the lower one 4. copy mutate parents

Reinforcement learning for device placement

many-device training (distributed)
design for bigger models
design for larger batch size

Session 7 by Yunsong GUO from Pinterest

Theme: Homefeed Recommendation in Pinterest

Takeaway:

Feature engineering may bring higher returns than applying new models.
Iterate models / features is a possible way to increase performance.
A simple algorithm may perform better than a complicated model.
Iterate quickly.
Online inference may have a different evaluation with the offline training.

Features to use:

users
pins
interactions

Workflow:

candidate generation (reduce data amount) 1. collaborative filtering (use different pictures from same boards) 2. random walking
machine learning algorithms 1. time based 2. logistic regressions
1. LR
2. SVM (sofia-ml)
3. feature engineering
4. the best feature may increase 4% relevance,
5. age is not a useful feature in this case
6. GBDT
7. use with ensambling model
8. split sparse and non-sparse features first
9. similar effect as word-embedding
feed generation (may reduce relevance)

Feature engineering

specialized < needs to train internal domain experts
high failure rate < be patient, iterate faster, reduce deployment period
low transparency < record experiments
high labor cost < build a platform to support
high return < higher than new models

Cold start problem:

ask questions
use data from similar users

Evaluation difference of online inference and offline training:

find the correlation between online and offline standards
map the offline standard to online one.
reduce experiment period
filter out low-return experiments

Diversity problem:

use recommendations from different categories
drop some relevance for diversity

Session 8 by Lou YIN from Airbnb

Theme: GB Decision Table

Takeaway: Data structure may have a huge impact on performance.

Best practices:

set depth to a limited number
set shinkage 1. default is 1 2. better use a smaller one, e.g.: 0.01
sub-sampling 1. similar to SGD
tradeoff of model size 1. accuracy ↔ scoring time

Decision table VS decision tree

freedom: decision table < decision tree
speed: decision table > decision tree (50% gain)
variance and bias: decision table < decision tree

Data structure of decision table

Optimization of decision table in gradient boosting

backfitting:
1. cyclic (recommended)
2. random (recommended)
3. greedy (biggest gain, but slow)
speed up scoring
1. avoid repeated tests
2. sort the table by values
3. online inference 1. pre-set user features

Feature engineering of GBDT

feature discretisation
1. reduce outliers
2. euqi-width
3. equi-frequency
4. not good for linear features (pre-compute a linear model)

Readings:

Lou, Y., & Obukhov, M. (2017). BDT: Gradient Boosted Decision Tables for High Accuracy and Scoring Efficiency (pp. 1893–1901). Presented at the the 23rd ACM SIGKDD International Conference, New York, New York, USA: ACM Press. https://yinlou.github.io/papers/lou-kdd17.pdf

Session9 by Eric KIM from LinkedIn

Theme: Helix and Nuage

Takeaway:

Easy-to-use and operatibility should be planned in advance.
Applying algorithms in engineering is a new trend.

Helix: data storing platform Nuage: data consuming platform

Invisible:

ultra-simple to use
elastic and scale easily
no single point of failure
highly operatable

challenges:

fault-tolerance
data replication
job management
load balancing
high performance
failure fix

capacity detection:

pessimistic model
ARIMA

Session 12 by Zhiyuan ZONG from QIY

Theme: Risk Management in QIY

Takeaway: Always clarify and focus on the needs.

High precision, low recall Split account behavior from textual data Use LSTM to predict user behaviors (sequence prediction)

Session 13 by Ming HUANG from Tencent

Theme: Spark on Angel

Takeaway: Modify a tool if needed.

Needs: solve big models on big data

driver is the main bottleneck
needs to reduce dimension
executors need to wait (PST mode)

Architecture

mutable layer on immutable layer
PS model
1. mutable
2. no new memory allocated
3. operate PS model in order to operate on remote servers
server: flexible
model layer: virtual
client: replaceable with DL / spark

Angel API

simplify writing procedure
a unifier API
1. vector (inherit from Spark: Breeze PS, Cached PS)
2. MLLib in Spark is based on Breeze
3. easy to migrate to Angel
4. matrix

Session 14 by Chao Can from Amazon

Theme: Micro-service and Serverless

Takeaway

Refactoring to micro-service is an option.
Consider about data structure when doing refactoring.
Micro-service has different gratitude, choose accordingly.

The proper team size in Amazon is about the number that can share two pizzas.

Refactoring:

make code as modules
interface
make facade
split database
build local mock

Auto-scaling is not at run time, because it needs time to load.

Moving to server less

benchmark everything
functions should be stateless
user AWS Step Functions
be careful with FAAS (function as a service)
keep it natural use CLI to control Lambda

Readings: Domain-Driven Design The Art of Scalibility

Session 15 by Youlin LI from Facebook

Theme: Real-time training of Newsfeed

Takeaway: Follow the defined goal to develop and iterate.

Slogan of Facebook:

Focus on impact
Move fast

Daily deployment

FB Joiner captures all events for a given story/session within a window, and outputs.

Newsfeed:

rule-based ranking -> machine-learning-based ranking
use feedback to retrain model
real-time joining
time window: 3 min (invalid if a session is longer than 3 mins)
tolerate value losses (0.5% loss)

Notes of QCon Shanghai 2017

General thoughts

Sessions

Session 1 by Prof. Hui WEI from Fudan

Session 2 by Zaiqing NIE from Alibaba

Session 3 by Hui Wang from Paypal

Session 4 by Xiongjie LIAO from TalkingData

Session 6 by Yuefeng ZHOU from Google Brain

Session 7 by Yunsong GUO from Pinterest

Session 8 by Lou YIN from Airbnb

Session9 by Eric KIM from LinkedIn

Session 12 by Zhiyuan ZONG from QIY

Session 13 by Ming HUANG from Tencent

Session 14 by Chao Can from Amazon

Session 15 by Youlin LI from Facebook

Related Posts