- API reference
- General functions
- pandas.merge_asof
pandas.merge_asof#
- pandas.merge_asof(left,right,on=None,left_on=None,right_on=None,left_index=False,right_index=False,by=None,left_by=None,right_by=None,suffixes=('_x','_y'),tolerance=None,allow_exact_matches=True,direction='backward')[source]#
Perform a merge by key distance.
This is similar to a left-join except that we match on nearestkey rather than equal keys. Both DataFrames must be sorted by the key.
For each row in the left DataFrame:
A “backward” search selects the last row in the right DataFrame whose‘on’ key is less than or equal to the left’s key.
A “forward” search selects the first row in the right DataFrame whose‘on’ key is greater than or equal to the left’s key.
A “nearest” search selects the row in the right DataFrame whose ‘on’key is closest in absolute distance to the left’s key.
Optionally match on equivalent keys with ‘by’ before searching with ‘on’.
- Parameters:
- leftDataFrame or named Series
- rightDataFrame or named Series
- onlabel
Field name to join on. Must be found in both DataFrames.The data MUST be ordered. Furthermore this must be a numeric column,such as datetimelike, integer, or float. On or left_on/right_onmust be given.
- left_onlabel
Field name to join on in left DataFrame.
- right_onlabel
Field name to join on in right DataFrame.
- left_indexbool
Use the index of the left DataFrame as the join key.
- right_indexbool
Use the index of the right DataFrame as the join key.
- bycolumn name or list of column names
Match on these columns before performing merge operation.
- left_bycolumn name
Field names to match on in the left DataFrame.
- right_bycolumn name
Field names to match on in the right DataFrame.
- suffixes2-length sequence (tuple, list, …)
Suffix to apply to overlapping column names in the left and rightside, respectively.
- toleranceint or Timedelta, optional, default None
Select asof tolerance within this range; must be compatiblewith the merge index.
- allow_exact_matchesbool, default True
If True, allow matching with the same ‘on’ value(i.e. less-than-or-equal-to / greater-than-or-equal-to)
If False, don’t match the same ‘on’ value(i.e., strictly less-than / strictly greater-than).
- direction‘backward’ (default), ‘forward’, or ‘nearest’
Whether to search for prior, subsequent, or closest matches.
- Returns:
- DataFrame
See also
merge
Merge with a database-style join.
merge_ordered
Merge with optional filling/interpolation.
Examples
>>>left=pd.DataFrame({"a":[1,5,10],"left_val":["a","b","c"]})>>>left a left_val0 1 a1 5 b2 10 c
>>>right=pd.DataFrame({"a":[1,2,3,6,7],"right_val":[1,2,3,6,7]})>>>right a right_val0 1 11 2 22 3 33 6 64 7 7
>>>pd.merge_asof(left,right,on="a") a left_val right_val0 1 a 11 5 b 32 10 c 7
>>>pd.merge_asof(left,right,on="a",allow_exact_matches=False) a left_val right_val0 1 a NaN1 5 b 3.02 10 c 7.0
>>>pd.merge_asof(left,right,on="a",direction="forward") a left_val right_val0 1 a 1.01 5 b 6.02 10 c NaN
>>>pd.merge_asof(left,right,on="a",direction="nearest") a left_val right_val0 1 a 11 5 b 62 10 c 7
We can use indexed DataFrames as well.
>>>left=pd.DataFrame({"left_val":["a","b","c"]},index=[1,5,10])>>>left left_val1 a5 b10 c
>>>right=pd.DataFrame({"right_val":[1,2,3,6,7]},index=[1,2,3,6,7])>>>right right_val1 12 23 36 67 7
>>>pd.merge_asof(left,right,left_index=True,right_index=True) left_val right_val1 a 15 b 310 c 7
Here is a real-world times-series example
>>>quotes=pd.DataFrame(...{..."time":[...pd.Timestamp("2016-05-25 13:30:00.023"),...pd.Timestamp("2016-05-25 13:30:00.023"),...pd.Timestamp("2016-05-25 13:30:00.030"),...pd.Timestamp("2016-05-25 13:30:00.041"),...pd.Timestamp("2016-05-25 13:30:00.048"),...pd.Timestamp("2016-05-25 13:30:00.049"),...pd.Timestamp("2016-05-25 13:30:00.072"),...pd.Timestamp("2016-05-25 13:30:00.075")...],..."ticker":[..."GOOG",..."MSFT",..."MSFT",..."MSFT",..."GOOG",..."AAPL",..."GOOG",..."MSFT"...],..."bid":[720.50,51.95,51.97,51.99,720.50,97.99,720.50,52.01],..."ask":[720.93,51.96,51.98,52.00,720.93,98.01,720.88,52.03]...}...)>>>quotes time ticker bid ask0 2016-05-25 13:30:00.023 GOOG 720.50 720.931 2016-05-25 13:30:00.023 MSFT 51.95 51.962 2016-05-25 13:30:00.030 MSFT 51.97 51.983 2016-05-25 13:30:00.041 MSFT 51.99 52.004 2016-05-25 13:30:00.048 GOOG 720.50 720.935 2016-05-25 13:30:00.049 AAPL 97.99 98.016 2016-05-25 13:30:00.072 GOOG 720.50 720.887 2016-05-25 13:30:00.075 MSFT 52.01 52.03
>>>trades=pd.DataFrame(...{..."time":[...pd.Timestamp("2016-05-25 13:30:00.023"),...pd.Timestamp("2016-05-25 13:30:00.038"),...pd.Timestamp("2016-05-25 13:30:00.048"),...pd.Timestamp("2016-05-25 13:30:00.048"),...pd.Timestamp("2016-05-25 13:30:00.048")...],..."ticker":["MSFT","MSFT","GOOG","GOOG","AAPL"],..."price":[51.95,51.95,720.77,720.92,98.0],..."quantity":[75,155,100,100,100]...}...)>>>trades time ticker price quantity0 2016-05-25 13:30:00.023 MSFT 51.95 751 2016-05-25 13:30:00.038 MSFT 51.95 1552 2016-05-25 13:30:00.048 GOOG 720.77 1003 2016-05-25 13:30:00.048 GOOG 720.92 1004 2016-05-25 13:30:00.048 AAPL 98.00 100
By default we are taking the asof of the quotes
>>>pd.merge_asof(trades,quotes,on="time",by="ticker") time ticker price quantity bid ask0 2016-05-25 13:30:00.023 MSFT 51.95 75 51.95 51.961 2016-05-25 13:30:00.038 MSFT 51.95 155 51.97 51.982 2016-05-25 13:30:00.048 GOOG 720.77 100 720.50 720.933 2016-05-25 13:30:00.048 GOOG 720.92 100 720.50 720.934 2016-05-25 13:30:00.048 AAPL 98.00 100 NaN NaN
We only asof within 2ms between the quote time and the trade time
>>>pd.merge_asof(...trades,quotes,on="time",by="ticker",tolerance=pd.Timedelta("2ms")...) time ticker price quantity bid ask0 2016-05-25 13:30:00.023 MSFT 51.95 75 51.95 51.961 2016-05-25 13:30:00.038 MSFT 51.95 155 NaN NaN2 2016-05-25 13:30:00.048 GOOG 720.77 100 720.50 720.933 2016-05-25 13:30:00.048 GOOG 720.92 100 720.50 720.934 2016-05-25 13:30:00.048 AAPL 98.00 100 NaN NaN
We only asof within 10ms between the quote time and the trade timeand we exclude exact matches on time. Howeverprior data willpropagate forward
>>>pd.merge_asof(...trades,...quotes,...on="time",...by="ticker",...tolerance=pd.Timedelta("10ms"),...allow_exact_matches=False...) time ticker price quantity bid ask0 2016-05-25 13:30:00.023 MSFT 51.95 75 NaN NaN1 2016-05-25 13:30:00.038 MSFT 51.95 155 51.97 51.982 2016-05-25 13:30:00.048 GOOG 720.77 100 NaN NaN3 2016-05-25 13:30:00.048 GOOG 720.92 100 NaN NaN4 2016-05-25 13:30:00.048 AAPL 98.00 100 NaN NaN