pandas.merge_asof#
- pandas.merge_asof(left,right,on=None,left_on=None,right_on=None,left_index=False,right_index=False,by=None,left_by=None,right_by=None,suffixes=('_x','_y'),tolerance=None,allow_exact_matches=True,direction='backward')[source]#
Perform a merge by key distance.
This is similar to a left-join except that we match on nearestkey rather than equal keys. Both DataFrames must be first sorted bythe merge key in ascending order before calling this function.Sorting by any additional ‘by’ grouping columns is not required.
For each row in the left DataFrame:
A “backward” search selects the last row in the right DataFrame whose‘on’ key is less than or equal to the left’s key.
A “forward” search selects the first row in the right DataFrame whose‘on’ key is greater than or equal to the left’s key.
A “nearest” search selects the row in the right DataFrame whose ‘on’key is closest in absolute distance to the left’s key.
Optionally match on equivalent keys with ‘by’ before searching with ‘on’.
- Parameters:
- leftDataFrame or named Series
First pandas object to merge.
- rightDataFrame or named Series
Second pandas object to merge.
- onlabel
Field name to join on. Must be found in both DataFrames.The data MUST be in ascending order. Furthermore this must bea numeric column, such as datetimelike, integer, or float.
onorleft_on/right_onmust be given.- left_onlabel
Field name to join on in left DataFrame. If specified, sort the leftDataFrame by this column in ascending order before merging.
- right_onlabel
Field name to join on in right DataFrame. If specified, sort the rightDataFrame by this column in ascending order before merging.
- left_indexbool
Use the index of the left DataFrame as the join key.
- right_indexbool
Use the index of the right DataFrame as the join key.
- bycolumn name or list of column names
Match on these columns before performing merge operation. It is not requiredto sort by these columns.
- left_bycolumn name
Field names to match on in the left DataFrame.
- right_bycolumn name
Field names to match on in the right DataFrame.
- suffixes2-length sequence (tuple, list, …)
Suffix to apply to overlapping column names in the left and rightside, respectively.
- toleranceint or timedelta, optional, default None
Select asof tolerance within this range; must be compatiblewith the merge index.
- allow_exact_matchesbool, default True
If True, allow matching with the same ‘on’ value(i.e. less-than-or-equal-to / greater-than-or-equal-to)
If False, don’t match the same ‘on’ value(i.e., strictly less-than / strictly greater-than).
- direction‘backward’ (default), ‘forward’, or ‘nearest’
Whether to search for prior, subsequent, or closest matches.
- Returns:
- DataFrame
A DataFrame of the two merged objects, containing all rows from theleft DataFrame and the nearest matches from the right DataFrame.
See also
mergeMerge with a database-style join.
merge_orderedMerge with optional filling/interpolation.
Examples
>>>left=pd.DataFrame({"a":[1,5,10],"left_val":["a","b","c"]})>>>left a left_val0 1 a1 5 b2 10 c
>>>right=pd.DataFrame({"a":[1,2,3,6,7],"right_val":[1,2,3,6,7]})>>>right a right_val0 1 11 2 22 3 33 6 64 7 7
>>>pd.merge_asof(left,right,on="a") a left_val right_val0 1 a 11 5 b 32 10 c 7
>>>pd.merge_asof(left,right,on="a",allow_exact_matches=False) a left_val right_val0 1 a NaN1 5 b 3.02 10 c 7.0
>>>pd.merge_asof(left,right,on="a",direction="forward") a left_val right_val0 1 a 1.01 5 b 6.02 10 c NaN
>>>pd.merge_asof(left,right,on="a",direction="nearest") a left_val right_val0 1 a 11 5 b 62 10 c 7
We can use indexed DataFrames as well.
>>>left=pd.DataFrame({"left_val":["a","b","c"]},index=[1,5,10])>>>left left_val1 a5 b10 c
>>>right=pd.DataFrame({"right_val":[1,2,3,6,7]},index=[1,2,3,6,7])>>>right right_val1 12 23 36 67 7
>>>pd.merge_asof(left,right,left_index=True,right_index=True) left_val right_val1 a 15 b 310 c 7
Here is a real-world times-series example
>>>quotes=pd.DataFrame(...{..."time":[...pd.Timestamp("2016-05-25 13:30:00.023"),...pd.Timestamp("2016-05-25 13:30:00.023"),...pd.Timestamp("2016-05-25 13:30:00.030"),...pd.Timestamp("2016-05-25 13:30:00.041"),...pd.Timestamp("2016-05-25 13:30:00.048"),...pd.Timestamp("2016-05-25 13:30:00.049"),...pd.Timestamp("2016-05-25 13:30:00.072"),...pd.Timestamp("2016-05-25 13:30:00.075"),...],..."ticker":[..."GOOG",..."MSFT",..."MSFT",..."MSFT",..."GOOG",..."AAPL",..."GOOG",..."MSFT",...],..."bid":[720.50,51.95,51.97,51.99,720.50,97.99,720.50,52.01],..."ask":[720.93,51.96,51.98,52.00,720.93,98.01,720.88,52.03],...}...)>>>quotes time ticker bid ask0 2016-05-25 13:30:00.023 GOOG 720.50 720.931 2016-05-25 13:30:00.023 MSFT 51.95 51.962 2016-05-25 13:30:00.030 MSFT 51.97 51.983 2016-05-25 13:30:00.041 MSFT 51.99 52.004 2016-05-25 13:30:00.048 GOOG 720.50 720.935 2016-05-25 13:30:00.049 AAPL 97.99 98.016 2016-05-25 13:30:00.072 GOOG 720.50 720.887 2016-05-25 13:30:00.075 MSFT 52.01 52.03
>>>trades=pd.DataFrame(...{..."time":[...pd.Timestamp("2016-05-25 13:30:00.023"),...pd.Timestamp("2016-05-25 13:30:00.038"),...pd.Timestamp("2016-05-25 13:30:00.048"),...pd.Timestamp("2016-05-25 13:30:00.048"),...pd.Timestamp("2016-05-25 13:30:00.048"),...],..."ticker":["MSFT","MSFT","GOOG","GOOG","AAPL"],..."price":[51.95,51.95,720.77,720.92,98.0],..."quantity":[75,155,100,100,100],...}...)>>>trades time ticker price quantity0 2016-05-25 13:30:00.023 MSFT 51.95 751 2016-05-25 13:30:00.038 MSFT 51.95 1552 2016-05-25 13:30:00.048 GOOG 720.77 1003 2016-05-25 13:30:00.048 GOOG 720.92 1004 2016-05-25 13:30:00.048 AAPL 98.00 100
By default we are taking the asof of the quotes
>>>pd.merge_asof(trades,quotes,on="time",by="ticker") time ticker price quantity bid ask0 2016-05-25 13:30:00.023 MSFT 51.95 75 51.95 51.961 2016-05-25 13:30:00.038 MSFT 51.95 155 51.97 51.982 2016-05-25 13:30:00.048 GOOG 720.77 100 720.50 720.933 2016-05-25 13:30:00.048 GOOG 720.92 100 720.50 720.934 2016-05-25 13:30:00.048 AAPL 98.00 100 NaN NaN
We only asof within 2ms between the quote time and the trade time
>>>pd.merge_asof(...trades,quotes,on="time",by="ticker",tolerance=pd.Timedelta("2ms")...) time ticker price quantity bid ask0 2016-05-25 13:30:00.023 MSFT 51.95 75 51.95 51.961 2016-05-25 13:30:00.038 MSFT 51.95 155 NaN NaN2 2016-05-25 13:30:00.048 GOOG 720.77 100 720.50 720.933 2016-05-25 13:30:00.048 GOOG 720.92 100 720.50 720.934 2016-05-25 13:30:00.048 AAPL 98.00 100 NaN NaN
We only asof within 10ms between the quote time and the trade timeand we exclude exact matches on time. Howeverprior data willpropagate forward
>>>pd.merge_asof(...trades,...quotes,...on="time",...by="ticker",...tolerance=pd.Timedelta("10ms"),...allow_exact_matches=False,...) time ticker price quantity bid ask0 2016-05-25 13:30:00.023 MSFT 51.95 75 NaN NaN1 2016-05-25 13:30:00.038 MSFT 51.95 155 51.97 51.982 2016-05-25 13:30:00.048 GOOG 720.77 100 NaN NaN3 2016-05-25 13:30:00.048 GOOG 720.92 100 NaN NaN4 2016-05-25 13:30:00.048 AAPL 98.00 100 NaN NaN