Movatterモバイル変換

Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Speaker Deck

Speaker Deck

Residual2Vec: Debiasing graph embedding with r...

January 25, 2022

1.6k

Residual2Vec: Debiasing graph embedding with random graphs

■イベント：NeurIPS 2021 論文読み会
https://line.connpass.com/event/236385/

■登壇概要
タイトル：Residual2Vec: Debiasing graph embedding with random graphs
発表者：   技術本部研究開発部　黒木裕鷹

▼Twitter
https://twitter.com/SansanRandD

Sansan DSOC

January 25, 2022

More Decks by Sansan DSOC

See All by Sansan DSOC

Sansan x atmaCup #12 Solution概要 / Sansan×atmaCup#12 Solution Summary

0

2.8k

問題設定とデータの説明 / Problem setting and Data description_Sansan x atmaCup #12

0

2.7k

1st Place Solution / 1st Place Solution Sansan x atmaCup #12

0

2.6k

学生部門1位 (6th Place Solution) / 1st Place Solution in The Student Category (6th Place Solution)

0

2.7k

⼊社から現在までの取り組み / What I have been working on since joining the company

0

490

学⽣時代の経験は研究開発業務で活きるのか / Will experience as a student be useful in R&D work?

0

310

Bill Oneデータ化の安定稼働 / Efforts to ensure stable operation of digitalization at Bill One

0

420

画像系研究員の今までとこれから / The past and future of a researcher

1

510

実務と研究のはざまで格闘する / Struggling between practice and research

0

310

Other Decks in Technology

See All in Technology

How Do I Contact Jetblue Airlines® Reservation Number: Fast Support Guide

thejetblueairhelpsupport

0

150

[SRE NEXT 2025] すみずみまで暖かく照らすあなたの太陽でありたい

2

470

毎晩の負荷試験自動実行による効果

recruitengineers

5

180

Deep Security Conference 2025：生成AI時代のセキュリティ監視 /dsc2025-genai-secmon

4

2.8k

無理しない AI 活用サービス / #jazug

0

100

LLM拡張解体新書/llm-extension-deep-dive

oracle4engineer

23

6.2k

CDK Vibe Coding Fes

1

630

LIXIL基幹システム刷新に立ち向かう技術的アプローチについて

1

380

研究開発部メンバーの働き⽅ / Sansan R&D Profile

3

18k

ソフトウェアテストのAI活用_ver1.25

1

610

ソフトウェアQAがハードウェアの人になったの

3

200

助けて! XからWaylandに移行しないと新しいGNOMEが使えなくなっちゃう 2025-07-12

2

200

Featured

See All Featured

29

5.4k

The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024

26

2.9k

個人開発の失敗を避けるイケてる考え方 / tips for indie hackers

108

19k

What’s in a name? Adding method to the madness

productmarketing

23

3.5k

Put a Button on it: Removing Barriers to Going Fast.

60

3.9k

Building Better People: How to give real-time feedback that sticks.

367

19k

[RailsConf 2023 Opening Keynote] The Magic of Rails

29

9.6k

Exploring the Power of Turbo Streams & Action Cable | RailsConf2023

34

5.9k

Producing Creativity

346

40k

A better future with KSS

238

17k

I Don’t Have Time: Getting Over the Fear to Launch Your Podcast

32

2.4k

How STYLIGHT went responsive

100

5.6k

Transcript

Residual2Vec: Debiasing graph embedding with random graphs Sansan株式会社技術本部研究開発部
⿊⽊裕鷹 NeurIPS 2021 論⽂読み会 2/25
Data Strategy and Operation Center ⾃⼰紹介⿊⽊裕鷹オンライン名刺 •
2020年4⽉⼊社 • 4⽉の⻑野マラソンに出場 Yutaka Kuroki Sansan 株式会社技術本部研究開発部 Data Analysis Group 研究員プロダクト戦略室兼務 @kur0cky_y
Data Strategy and Operation Center 論⽂情報 • 筆頭著者は Indiana⼤学のポスドク
• 複雑ネットワークと CS (ML) の交差点に興味がありそう • 剣道 3段らしい著者： NeurIPS 2021 ポスター学会：
Data Strategy and Operation Center 1ページでまとめ • ランダムウォークベースの node embedding
では，ノードのサンプリングに偏りが⽣じる • 偏り除去のための埋め込みアルゴリズム “residual2vec” を提案 > word2vec の Negative Sampling から着想 > 任意のグラフモデルを仮定し，それを超える情報を⾏列分解 > 偏りの除去により，後段タスクの性能が向上することを確認 • コードは github ( https://github.com/skojaku/residual2vec )
背景友達のパラドックス
Data Strategy and Operation Center 友情のパラドックス「友達の⽅が友達が多い」
Data Strategy and Operation Center 友情のパラドックス「友達の⽅が友達が多い」 • 任意のノードについて，その隣接ノードの⽅が平均的に次数が⼤きい •
いわゆる “スケールフリー性” に起因 > ⼀部のノードが多数のエッジを占有する状態（ハブの存在） > 現実の多くのグラフデータに認められる性質 à グラフ上のランダムウォークも次数由来の “偏り” を受ける
Data Strategy and Operation Center 簡単な例 • 単純な⽣成モデル (Stochastic Block
Model; SBM) でのシミュレーション • 20％のコアノードが，ランダムウォーク系列の過半数を占める • 次数の⼩さいノードが過⼩評価されることにつながる Figure 4 (Kojaku et al., 2021)
Data Strategy and Operation Center 研究の⽬的 Q1: 次数由来の “偏り” が悪いとすれば，どうすれば取り除けるのか
Q2: そもそも ”偏り” は node embedding にとって良いのか，悪いのか
背景 Skip-Gram Negative Sampling
Data Strategy and Operation Center word2vec (Skip-Gram) 注⽬する単語 𝑖 の周辺に，別の単語
𝑗 が現れる確率を softmax で表す • 最尤法で推定するが，語彙数 𝑁 の増加に応じて分⺟の計算が厳しい Negative Sampling • 実際に現れる単語ペアを正例 (𝑌 ! = 1) に，ランダムな関係ないワード ℓ とのペアを負例 (𝑌 ! = 0) にとり，分類問題を解く • ℓ は，ノイズ分布 𝑝" ℓ = 𝑃 ℓ|𝑌ℓ = 0 に従ってサンプリングする à … à
Data Strategy and Operation Center アイデア • 𝑝! 𝑗 が，ランダムウォークにおける次数の偏りを軽減している
• 𝑝" 𝑗 がベースラインとなり，類似度 𝑢$ %𝑣! がそこからの偏差とみなせる • SGNS では 𝑝" 𝑗 が単語の頻度によって決まるため，得られる 𝑢$ %𝑣! では頻度の偏りがかかっていない • 𝑝! ℓ は⾃由に設定することができる • 任意のグラフモデルを仮定し利⽤することができる • また，よく知られたnull model であれば，解析的に求まる
提案⼿法
Data Strategy and Operation Center 提案⼿法：residual2vec 1. グラフの “null model”
を与える 2. 埋め込むグラフデータ上のランダムウォーク， null model の⽣成グラフ上のランダムウォーク系列をそれぞれ考える 3. 2つの系列を⽐較し null model に含まれない残情報 (residual information) を抽出 > ここでの情報とは，PMI (Pointwise Mutual Information) ⾏列 > null model によっては，null 側の系列を作成せずに済む（解析的に求める） 4. PMI⾏列の差（残情報）を⾏列因⼦分解して低次元近似
Data Strategy and Operation Center 提案⼿法：residual2vec Figure 2 (Kojaku et
al., 2021)
実験・結果
Data Strategy and Operation Center 実験 1. Link prediction >
様々なグラフデータセットで評価（実験の詳細がちょっと謎） > 次数の偏りを除くため Configuration model (Fosdick et al., 2018) を null model に > AUC-ROC で評価 2. Community detection > データはベンチマークモデル (Lancichinetti et al., 2009)からの⽣成 > 1 vs rest の AUC-ROC で評価 3. Case Study > 掲載論⽂誌と出版年ごとにノードとした引⽤ネットワーク (Web of Science) > 次数の偏りだけでなく，昔の論⽂ほど引⽤が多くなる偏りがある > 時系列の偏りを取り除くため，出版年をブロックとした degree-corrected SBM (Karrer and Newman, 2011) を null model に
Data Strategy and Operation Center 結果 (Link prediction, Community detection)
Figure 3 (Kojaku et al., 2021) 提案⼿法がほぼ最良の結果に
Data Strategy and Operation Center 結果（引⽤ネットワークの埋め込み） • Glove, node2vec では昔のノードが中⼼に来ている（時間バイアス）
• 提案⼿法では取り除けている Figure 4 (Kojaku et al., 2021)
Data Strategy and Operation Center 結果（引⽤ネットワークの埋め込み）分野・IFの予測では提案⼿法がほぼ最良の結果に Figure 4 (Kojaku
et al., 2021)
Data Strategy and Operation Center まとめ・感想 • 次数に起因する “偏り” を除くことを⽬的とした
node embedding を開発 • Negative Sampling から着想を得て，ノイズ分布に任意のランダムグラフを仮定する residual2vec を提案 • PMI の残差⾏列を因⼦分解 • シンプルなアイデアで効果的な結果を出していてすごい • 複雑ネットワークとグラフ ML の知識を有機的に繋いでいきたい（願望） • 既存研究との⽐較があまりなされていなかった • personalized pagerank を利⽤した debias への⾔及がない • 実験の詳細にかけるところがある
Data Strategy and Operation Center 参考⽂献 • Fosdick, B. K.,
Larremore, D. B., Nishimura, J., & Ugander, J. (2018). Configuring random graph models with fixed degree sequences. Siam Review, 60(2), 315-355. • Karrer, B., & Newman, M. E. (2011). Stochastic blockmodels and community structure in networks. Physical review E, 83(1), 016107. • Kojaku, S., Yoon, J., Constantino, I., & Ahn, Y. Y. (2021). Residual2Vec: Debiasing graph embedding with random graphs. Advances in Neural Information Processing Systems, 34. • Lancichinetti, A., & Fortunato, S. (2009). Community detection algorithms: a comparative analysis. Physical review E, 80(5), 056117.
None

[8]ページ先頭

©2009-2025 Movatter.jp