Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

FIX: translate timedeltas in _to_ordinalf#12863

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Closed

Conversation

jklymak
Copy link
Member

@jklymakjklymak commentedNov 21, 2018
edited
Loading

PR Summary

Allows timedeltas to be returned in untis of floating point days indates._to_ordinalf so thatmdates.date2num(timedelta(days=1, hours=2)) returns1.0833.

mdates.date2num(np.timedelta64(26, 'h')) returns1.0833.

Closes#12862

Needs tests and docs....

Needs to see if something similar can be done for np.datetime64 objects?

Note, supercedes#9120, and I think its in the right place (rather than on its own)...

PR Checklist

  • Has Pytest style unit tests
  • Code isFlake 8 compliant
  • New features are documented, with examples if plot related
  • Documentation is sphinx and numpydoc compliant
  • Added an entry to doc/users/next_whats_new/ if major new feature (follow instructions in README.rst there)
  • Documented in doc/api/api_changes.rst if API changed in a backward-incompatible way

@jklymak
Copy link
MemberAuthor

#9120 (comment)

I haven't had a chance to really look at and grok how the units framework works. I suppose it's possible that units are inferred from type and that everything gets translated into a number under the hood and plotted on the same plot,

Thats indeed what happens; datetimes are translated to an ordinal day (days since 1 Jan 0000 plus one). So the proposed behaviour for time deltas in bar plots (for instance) is to translate them to days so they can be relative to an actual datetime value that anchors the bars.

I don't think the approach in#9120 will help much - users are going to supply lists that mix datetime and timedelta objects, so this PR just decides on an element by element basis what to do w the object.

@jklymakjklymakforce-pushed thefix-datetime-timedelta branch 6 times, most recently fromc569ca7 tobec6488CompareNovember 22, 2018 22:06
@jklymak
Copy link
MemberAuthor

This is ready for review


def test_timedelta():
"""
test that timedelta objects are properly translated into days
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change
testthattimedeltaobjectsareproperlytranslatedintodays
Testthattimedeltaobjectsareproperlytranslatedintodays.

jklymak reacted with thumbs up emoji
# check that mixed lists work....
assert mdates.date2num(dt)[0] == 730120.0
assert mdates.date2num(dt)[1] == 1 + 2 / 24
dt = (np.datetime64('2000-01-01'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Add a blank line since a new part of the test starts here.

jklymak reacted with thumbs up emoji

elif (isinstance(d, np.ndarray) and
(np.issubdtype(d.dtype, np.datetime64) or
np.issubdtype(d.dtype, np.timedelta64))):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change
np.issubdtype(d.dtype,np.timedelta64))):
np.issubdtype(d.dtype,np.timedelta64))):

jklymak reacted with thumbs up emoji
return _to_ordinalf_np_vectorized(d)
elif hasattr(d, 'size') and not d.size:
# this is an empty
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

an empty what? 😄

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Yeah, I'm not sure. If I remove that test (which I didn't write), all the tests pass, so its not tested. OTOH, who knows if someone needs it? Left comment admiring my ignorance, and maybe someone will remove.

dnew = np.zeros(len(d))
for nn, dd in enumerate(d):
dnew[nn] = _dt64_to_ordinalf(dd)
return(dnew)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Does this work as well:

return np.array(_dt64_to_ordinalf(dd) for dd in d)

jklymak reacted with thumbs up emoji
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

That fails some tests and returns<generator object _dt64_to_ordinalf_iterable.<locals>.<genexpr> :-(

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Ok, maybe this is numpy-version dependent. Anyway, there has to be a way to not add each element one-by-one on the python level, which is slow. My next guess would be:

return np.from_iter((_dt64_to_ordinalf(dd) for dd in d), float, count=len(d))

and if all fails, we can still create a list-comprehension and turn that into an array:

return np.array([_dt64_to_ordinalf(dd) for dd in d])

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

np.fromiter works. Not at all sure I understand why...

FIX: translate timedelta64 in _dt64_to_ordinalf
@pganssle
Copy link
Member

I am not sure how much this addresses my main concerns in#9120. My biggest issue is thattimedelta anddatetime are fundamentally different creatures, despite the fact that they both have to do with time.

Consider:

  1. It doesn't make sense to plotdatetime andtimedelta objects on the same axis, since one is an absolute position and the other is a delta.
  2. The default axis fortimedelta and fordatetime should be completely different. Fortimedelta, you expect an elapsed-time axis denoted in minutes, seconds, hours, days, etc. Fordatetime you expect absolute dates.
  3. Basically all the special time-handling stuff isdatetime specific. It has to handle weird calendrical arithmetic, time zones and a number of other funky things.timedelta can be losslessly converted to a float withtotal_seconds() and converted back totimedelta withtimedelta(seconds=x).

I think thereare complications that you'll run into, particularly having to do with units (and compatibility withnp.timedelta64), but they're completely different from the kinds of complications you expect withdatetime, which is further evidence for not conflating the two abstractions into one.

@jklymak
Copy link
MemberAuthor

@pganssle Unfortunately, we have lots of API that require x and delta x be specified in the same tuple or array, and the units machinery is just called on the whole tuple and array. That works fine for most data (i.e 30 kg +/- 2 kg are both the same units...) but time is time, and in order to support some functions we need to be able to process both t and dt in the same call todate2num((t, dt)). Refactoring those functions to make different calls to the units machinery seems like a far heavier lift than detecting the types here and doing the right thing.

timhoffm
timhoffm previously approved these changesNov 26, 2018
@pganssle
Copy link
Member

@jklymak What do you mean by "the right thing"?

If I doplt.plot([1], [datetime.timedelta(3)]), what would be plotted? What axis would be used?

What happens if I do this:

plt.plot([1,2], [datetime.timedelta(hours=3),datetime.datetime(2018,1,1)])`

Will it throw an exception?

I realize that this is not a simple problem to solve, but if you merge something that just forcesdatetime andtimedelta into the same units automatically, you are creating a situation that is hard to reverse without breaking backwards compatibility. It is worth taking the time to get this right.

(i.e 30 kg +/- 2 kg are both the same units...)

My suggestion for handling spans and other situations like this is the same as it was in#9120. I believe these problems would go away if you did unit conversionsafter any arithmetical operations. By converting everything to float before calculating spans and such, you lose all custom__add__ behavior and the like, which means you can't have a span that is arelativedelta or any other interesting custom behavior.

I strongly recommend against treatingtimedelta and its ilk like adatetime for the short-term expedience of being able toignore the unit analysis.

@jklymak
Copy link
MemberAuthor

The first will have a point at t = 1 Jan 0000, 03:00, and the second will have a point there and at 2018. I don't consider thatso bad, though I agree the user shouldn't mix the vectors in this case, but, garbage in, garbage out.

My suggestion for handling spans and other situations like this is the same as it was in#9120. I believe these problems would go away if you did unit conversionsafter any arithmetical operations.

That would be great, but thatsnot how unit support works right now, and would require alot of engineering to make it work that way.@dopplershift has been thinking about these sorts of issues, and its on our radar to fix units in a "better" way. Though to be honest, the idea of assuming variable 'x' inplot has an__add__ method might be a heavy ask.

I strongly recommend against treatingtimedelta and its ilk like adatetime for the short-term expedience of being able toignore the unit analysis.

Any major changes to the units machinery is likely tonot be backwards compatible anyways. Given that, and the fact such an overhaul is not being actively worked on right now, I think making the current system work better is the less of two evils. I certainly agree with your main point, but practically I still think this PR is specifically helpful.

BTW, I think folks would be willing to have someone dive deep on units if you are interested, but it'd require a MEP first to make sure we all think it is heading in the right direction.

@pganssle
Copy link
Member

The first will have a point at t = 1 Jan 0000, 03:00, and the second will have a point there and at 2018. I don't consider that so bad, though I agree the user shouldn't mix the vectors in this case, but, garbage in, garbage out.

Both of these are decidedly the wrong thing. The first one isn't "garbage in", it's a very reasonable thing to do, actually. There are many experiments where the "time" axis is "time elapsed" not "absolute time" (includingnearly everything I've ever published). Real support for that in the form of a "TimeDeltaLocator" or "ElapsedTimeLocator" would be great.

The second one is nonsense and should either throw an exception or plot some garbage, depending on how it's usually handled when you inappropriately mix units in a single axis.

Any major changes to the units machinery is likely to not be backwards compatible anyways. Given that, and the fact such an overhaul is not being actively worked on right now, I think making the current system work better is the less of two evils. I certainly agree with your main point, but practically I still think this PR is specifically helpful.

I think it's fine to make incremental improvements, but this is the wrong way to do it. I don't have time to dig into this right now, but I think you're in a hole at the moment and the correct thing to do is to stop digging.

If you're looking to fix things like#11290 and#12862, you shouldn't do it in such a way thatalso makes fixing#8869 correctly a backwards-incompatible change.

Copy link
Member

@pgansslepganssle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Disagree with this approach, per my comments in the thread.

@@ -222,6 +222,10 @@ def _to_ordinalf(dt):
dt = dt.astimezone(UTC)
tzi = UTC

if isinstance(dt, datetime.timedelta):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

If this goes through, you need to update the docstring for_to_ordinalf to includetimedelta.

@dopplershift
Copy link
Contributor

I haven't understood all the nuances here, but the problem seems to stem from the fact thattimedelta has valid uses standalone (e.g. should just be another registered type for the unit conversion machinery) and has uses as one of two classes used arithmetically with dates and times.

Is there a particular reason why we're favoring the latter use case over the former? As far as I can tell, we're not successfully solving both use cases here.

@jklymak
Copy link
MemberAuthor

OK, I hadn't checked, but, because we aren't registering timedeltas of either flavours the date converter does not get called if atimedelta is the first element of the array. So, as is, the PR doesn't block whatever needs to be done to make#8869 work.

If I addtimedelta to the registry, then there arestill errors, but since this is not meant to make#8869 work, I'm not planning to work on this.

Overall, its not clear to me that this PR precludes properly handling fulltimedelta arrays, and adapting a new Locator (though is that necessary?) and Formatter (probably more necessary) for that kind of data.

importmatplotlib.pyplotaspltfromdatetimeimportdatetime,timedelta,timezoneplt.scatter([timedelta(hours=3),timedelta(hours=4),timedelta(hours=5)], [1,2,3]);plt.show()

gives

TypeError: float() argument must be a string or a number, not 'datetime.timedelta'
plt.plot([1,2], [datetime(2018,1,1),timedelta(hours=3)])

gives

ValueError: view limit minimum -36834.61875 is less than 1 and is an invalid Matplotlib date value. This often happens if you pass a non-datetime value to an axis that has datetime units

(Actually, I'm not 100% sure why this happens)

Switching order yields:

plt.plot([1,2], [timedelta(hours=3),datetime(2018,1,1)])
TypeError: float() argument must be a string or a number, not 'datetime.datetime'

Which is all to say that there is still scope for backwards compatible changes...

@jklymak
Copy link
MemberAuthor

@dopplershift sorry, cross post. But you are correct, we are not solving the vector-of-timedelta case. But we are not precluding it from working here either.

@dopplershift
Copy link
Contributor

But we are not precluding it from working here either.

I feel nervous given how much code is floating around special casing all of the various date/time things in the ecosystem that we can make this statement for certain, given that we don't know how to solve#8869.

I'm not saying this is right or wrong, just that we should exercise caution when adding more duct tape to our mess here.

@pganssle
Copy link
Member

I feel nervous given how much code is floating around special casing all of the various date/time things in the ecosystem that we can make this statement for certain, given that we don't know how to solve#8869.

I think theright way to solve#8869and the various "datetime axes can't have a span" issues floating around is clear. It's just not the approach anyone has taken, and if the various people working on it are to be believed (a qualification I only put in place because Ipersonally have not been able to dig into this, I see no reason to doubt them), it's hard to do.

I would be alot more comfortable with a refactoring based on dispatch. Also, throwing this out there, but how about something like this as a temporary measure (using@dataclass because it makes example code shorter):

@dataclassclassDeltaWrapper(float):_value :floatdef__add__(self,other):returnself._value+other

Doesn't have to befloat and it probably needs to be somewhat more complicated than this, but the idea would be that in theplotting code you can detect the presence of a_DeltaWrapper and raise an exception if you try to plot it. In span code it will work as expected. If necessary, this could become a stepping stone to a version that looks like this:

@dataclassclassDeltaWrapper(float):_delta:object# The pre-conversion object_value :float# The post-conversion objectdef__add__(self,other):try:returnself._delta+otherexceptException:returnself._value+other

Then you can incrementally convert over to the unit framework performing arithmetic pre-conversion without it throwing errors. Once everything is changed over, you can dropDeltaWrapper as vestigial.

@jklymak
Copy link
MemberAuthor

Perhaps an easy idea is to add aconvert_units_delta function that could go in parallel toconvert_units but did the right thing for deltas. By default, it could just callconvert_units. Then the task would be finding the places where a delta is expected and callingconvert_units_delta instead ofconvert_units.

@jklymak
Copy link
MemberAuthor

This new commit is a different approach as described above... The converter gets a newdef convert_delta(value, units, axis) method thataxis.convert_units_delta(self, dx) tries to call if the method exists.broken_barh then simply goes through its x values and converts the first element in the tuple differently than the second (delta) value.

I've not changed broken_barv, or the other axes methods where we expect the second element in a tuple etc to be a delta, but this gets the concept across as a proof of concept.

@jklymak
Copy link
MemberAuthor

Ah, OK, I see what the fundamental problem w/broken_barh was, but its not actually a problem for other functions with deltax or deltay (that I've seen).

The issue is thatbarh was doing the conversion on dx, whereas other functions indeed do the conversion onx+dx (i.e. errorbar or fill_between). So as@pganssle said above, we should do the math first and then convert. I mistakenly thought this issue was everywhere, but its not.

New PR on the way for the original issue. Closing this, but I'll open a new one based on this branch, since it does 75% of the work needed for#8869.

Thanks a lot@pganssle for your comments, which were largely correct.

pganssle reacted with thumbs up emoji

@wheeled
Copy link

I may have stumbled across an issue with the timedelta64 handling. The following produces a numpy SystemError:

import matplotlib.pyplot as pltimport matplotlib.dates as plt_dtimport pandas as pdfrom datetime import timedeltatimedelta_series = [timedelta(minutes=n) for n in range(0, 25, 5)]td_df = pd.DataFrame([1, 2, 3, 4, 5], timedelta_series, columns=["Val"])plt.plot(td_df.index, td_df.Val)ax = plt.gca()ax.xaxis.set_major_formatter(plt_dt.DateFormatter('%H:%M'))plt.show()

Looking into the error, it seems thatdates._from_ordinalf is receivingx as atimedelta64[ns] integer, which then produces a meaningless result when:

    dt = (np.datetime64(get_epoch()) +          np.timedelta64(int(np.round(x * MUSECONDS_PER_DAY)), 'us'))

This may only be a symptom rather than the root cause, and perhaps DateFormatter is not passing the expected value?

If it is the expected value being passed in this case, then the following would not produce the error...

    dt = (np.datetime64(get_epoch()) +          np.timedelta64(int(np.round(x), 'ns'))

... noting thatx is already in ns (rather than 'us').

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@dopplershiftdopplershiftdopplershift left review comments

@pgansslepgansslepganssle requested changes

@timhoffmtimhoffmtimhoffm left review comments

Assignees
No one assigned
Projects
None yet
Milestone
v3.1.0
Development

Successfully merging this pull request may close these issues.

broken_barh appears not to work with datetime/timedelta objects
5 participants
@jklymak@pganssle@dopplershift@wheeled@timhoffm

[8]ページ先頭

©2009-2025 Movatter.jp