Movatterモバイル変換


[0]ホーム

URL:


Skip to main content
Ctrl+K

pandas.merge_asof#

pandas.merge_asof(left,right,on=None,left_on=None,right_on=None,left_index=False,right_index=False,by=None,left_by=None,right_by=None,suffixes=('_x','_y'),tolerance=None,allow_exact_matches=True,direction='backward')[source]#

Perform a merge by key distance.

This is similar to a left-join except that we match on nearestkey rather than equal keys. Both DataFrames must be sorted by the key.

For each row in the left DataFrame:

  • A “backward” search selects the last row in the right DataFrame whose‘on’ key is less than or equal to the left’s key.

  • A “forward” search selects the first row in the right DataFrame whose‘on’ key is greater than or equal to the left’s key.

  • A “nearest” search selects the row in the right DataFrame whose ‘on’key is closest in absolute distance to the left’s key.

Optionally match on equivalent keys with ‘by’ before searching with ‘on’.

Parameters:
leftDataFrame or named Series
rightDataFrame or named Series
onlabel

Field name to join on. Must be found in both DataFrames.The data MUST be ordered. Furthermore this must be a numeric column,such as datetimelike, integer, or float. On or left_on/right_onmust be given.

left_onlabel

Field name to join on in left DataFrame.

right_onlabel

Field name to join on in right DataFrame.

left_indexbool

Use the index of the left DataFrame as the join key.

right_indexbool

Use the index of the right DataFrame as the join key.

bycolumn name or list of column names

Match on these columns before performing merge operation.

left_bycolumn name

Field names to match on in the left DataFrame.

right_bycolumn name

Field names to match on in the right DataFrame.

suffixes2-length sequence (tuple, list, …)

Suffix to apply to overlapping column names in the left and rightside, respectively.

toleranceint or Timedelta, optional, default None

Select asof tolerance within this range; must be compatiblewith the merge index.

allow_exact_matchesbool, default True
  • If True, allow matching with the same ‘on’ value(i.e. less-than-or-equal-to / greater-than-or-equal-to)

  • If False, don’t match the same ‘on’ value(i.e., strictly less-than / strictly greater-than).

direction‘backward’ (default), ‘forward’, or ‘nearest’

Whether to search for prior, subsequent, or closest matches.

Returns:
DataFrame

See also

merge

Merge with a database-style join.

merge_ordered

Merge with optional filling/interpolation.

Examples

>>>left=pd.DataFrame({"a":[1,5,10],"left_val":["a","b","c"]})>>>left    a left_val0   1        a1   5        b2  10        c
>>>right=pd.DataFrame({"a":[1,2,3,6,7],"right_val":[1,2,3,6,7]})>>>right   a  right_val0  1          11  2          22  3          33  6          64  7          7
>>>pd.merge_asof(left,right,on="a")    a left_val  right_val0   1        a          11   5        b          32  10        c          7
>>>pd.merge_asof(left,right,on="a",allow_exact_matches=False)    a left_val  right_val0   1        a        NaN1   5        b        3.02  10        c        7.0
>>>pd.merge_asof(left,right,on="a",direction="forward")    a left_val  right_val0   1        a        1.01   5        b        6.02  10        c        NaN
>>>pd.merge_asof(left,right,on="a",direction="nearest")    a left_val  right_val0   1        a          11   5        b          62  10        c          7

We can use indexed DataFrames as well.

>>>left=pd.DataFrame({"left_val":["a","b","c"]},index=[1,5,10])>>>left   left_val1         a5         b10        c
>>>right=pd.DataFrame({"right_val":[1,2,3,6,7]},index=[1,2,3,6,7])>>>right   right_val1          12          23          36          67          7
>>>pd.merge_asof(left,right,left_index=True,right_index=True)   left_val  right_val1         a          15         b          310        c          7

Here is a real-world times-series example

>>>quotes=pd.DataFrame(...{..."time":[...pd.Timestamp("2016-05-25 13:30:00.023"),...pd.Timestamp("2016-05-25 13:30:00.023"),...pd.Timestamp("2016-05-25 13:30:00.030"),...pd.Timestamp("2016-05-25 13:30:00.041"),...pd.Timestamp("2016-05-25 13:30:00.048"),...pd.Timestamp("2016-05-25 13:30:00.049"),...pd.Timestamp("2016-05-25 13:30:00.072"),...pd.Timestamp("2016-05-25 13:30:00.075")...],..."ticker":[..."GOOG",..."MSFT",..."MSFT",..."MSFT",..."GOOG",..."AAPL",..."GOOG",..."MSFT"...],..."bid":[720.50,51.95,51.97,51.99,720.50,97.99,720.50,52.01],..."ask":[720.93,51.96,51.98,52.00,720.93,98.01,720.88,52.03]...}...)>>>quotes                     time ticker     bid     ask0 2016-05-25 13:30:00.023   GOOG  720.50  720.931 2016-05-25 13:30:00.023   MSFT   51.95   51.962 2016-05-25 13:30:00.030   MSFT   51.97   51.983 2016-05-25 13:30:00.041   MSFT   51.99   52.004 2016-05-25 13:30:00.048   GOOG  720.50  720.935 2016-05-25 13:30:00.049   AAPL   97.99   98.016 2016-05-25 13:30:00.072   GOOG  720.50  720.887 2016-05-25 13:30:00.075   MSFT   52.01   52.03
>>>trades=pd.DataFrame(...{..."time":[...pd.Timestamp("2016-05-25 13:30:00.023"),...pd.Timestamp("2016-05-25 13:30:00.038"),...pd.Timestamp("2016-05-25 13:30:00.048"),...pd.Timestamp("2016-05-25 13:30:00.048"),...pd.Timestamp("2016-05-25 13:30:00.048")...],..."ticker":["MSFT","MSFT","GOOG","GOOG","AAPL"],..."price":[51.95,51.95,720.77,720.92,98.0],..."quantity":[75,155,100,100,100]...}...)>>>trades                     time ticker   price  quantity0 2016-05-25 13:30:00.023   MSFT   51.95        751 2016-05-25 13:30:00.038   MSFT   51.95       1552 2016-05-25 13:30:00.048   GOOG  720.77       1003 2016-05-25 13:30:00.048   GOOG  720.92       1004 2016-05-25 13:30:00.048   AAPL   98.00       100

By default we are taking the asof of the quotes

>>>pd.merge_asof(trades,quotes,on="time",by="ticker")                     time ticker   price  quantity     bid     ask0 2016-05-25 13:30:00.023   MSFT   51.95        75   51.95   51.961 2016-05-25 13:30:00.038   MSFT   51.95       155   51.97   51.982 2016-05-25 13:30:00.048   GOOG  720.77       100  720.50  720.933 2016-05-25 13:30:00.048   GOOG  720.92       100  720.50  720.934 2016-05-25 13:30:00.048   AAPL   98.00       100     NaN     NaN

We only asof within 2ms between the quote time and the trade time

>>>pd.merge_asof(...trades,quotes,on="time",by="ticker",tolerance=pd.Timedelta("2ms")...)                     time ticker   price  quantity     bid     ask0 2016-05-25 13:30:00.023   MSFT   51.95        75   51.95   51.961 2016-05-25 13:30:00.038   MSFT   51.95       155     NaN     NaN2 2016-05-25 13:30:00.048   GOOG  720.77       100  720.50  720.933 2016-05-25 13:30:00.048   GOOG  720.92       100  720.50  720.934 2016-05-25 13:30:00.048   AAPL   98.00       100     NaN     NaN

We only asof within 10ms between the quote time and the trade timeand we exclude exact matches on time. Howeverprior data willpropagate forward

>>>pd.merge_asof(...trades,...quotes,...on="time",...by="ticker",...tolerance=pd.Timedelta("10ms"),...allow_exact_matches=False...)                     time ticker   price  quantity     bid     ask0 2016-05-25 13:30:00.023   MSFT   51.95        75     NaN     NaN1 2016-05-25 13:30:00.038   MSFT   51.95       155   51.97   51.982 2016-05-25 13:30:00.048   GOOG  720.77       100     NaN     NaN3 2016-05-25 13:30:00.048   GOOG  720.92       100     NaN     NaN4 2016-05-25 13:30:00.048   AAPL   98.00       100     NaN     NaN

[8]ページ先頭

©2009-2025 Movatter.jp