Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Add support for handling timedelta#19236

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Draft
theOehrly wants to merge18 commits intomatplotlib:main
base:main
Choose a base branch
Loading
fromtheOehrly:timedelta-handling

Conversation

theOehrly
Copy link
Contributor

@theOehrlytheOehrly commentedJan 4, 2021
edited
Loading

PR Summary

This pull request adds full support for plotting timedelta values similar to the support that is available for plotting datetime values.
It includes converters, conversion functions, formatters and locators.

This allows for plottingdatetime.timedelta,numpy.timedelta64 and 'pandas.Timedelta'.

Locators and formatters provide nice ticks as multiples of default units of time (days, hours, ...)

PR Checklist

  • Has pytest style unit tests (andpytest passes).
  • IsFlake 8 compliant (runflake8 on changed files to check).
  • New features are documented, with examples if plot related.
  • Documentation is sphinx and numpydoc compliant (the docs shouldbuild without error).
  • Conforms to Matplotlib style conventions (installflake8-docstrings and runflake8 --docstring-convention=all).
  • New features have an entry indoc/users/next_whats_new/ (follow instructions in README.rst there).
  • API changes documented indoc/api/next_api_changes/ (follow instructions in README.rst there).

Examples

Example 1

Random timeseries data over a timedelta range of 100 days to 102 days

import matplotlib.pyplot as pltimport matplotlib.dates as mdatesimport datetimeimport numpy as npbase = datetime.timedelta(days=100)timedeltas = np.array([base + datetime.timedelta(minutes=(4 * i))                       for i in range(720)])N = len(timedeltas)np.random.seed(19680801)y = np.cumsum(np.random.randn(N))fig, ax = plt.subplots(figsize=(15, 5))locator = mdates.AutoTimedeltaLocator()formatter = mdates.AutoTimedeltaFormatter(locator)ax.xaxis.set_major_locator(locator)ax.xaxis.set_major_formatter(formatter)ax.plot(timedeltas, y)

example1

Example 2

The same as example 1 but usingConciseTimedeltaFormatter instead ofAutoTimedeltaFormatter

ex2

More detail

Conversion of values

  • timedelta values are converted to days as floating point values (same as internal date handling)

Locators

Fixed/AutoTimedeltaLocator similar to locators for datetime.

  • Fixed: ticks in fixed intervals
  • Auto: automatic intervals as multiples of days, hours, ...

Formatting

  • Function for timedelta string formatting with a template string. This is not natively available (contrary to formatting datetime). Therefore I created my own implementation of this. I tried to stay as close to the c standard implementation for datetime as possible with timedelta.

    Example: "%d days, %H:%M" --> '12 days, 14:34'

  • Fixed/Auto/ConciseFormatter similar to datetime formatters

    • Fixed: User defined format,
    • Auto: Selects tick format based on locator tick level; creates tick label with a full-length timedelta string
    • Concise: Selects tick format, axis offset and offset format based on locator tick level

General

General documentation does exist but certainly needs improvements, more examples and some things are still missing.
I have tried to create basic tests for all of the new functionality but I assume some additional tests may be necessary.

Reasons why I think this is an improvement

  • timedelta is very common, therefore default functionality is nice to have
  • relationship with datetime: provide similar support for both

demisjohn reacted with thumbs up emoji
@jklymak
Copy link
Member

Thanks@theOehrly looks like a great amount of work. I've taken a quick look, and this looks like a good proof-of-concept.

Overall I am concerned about feature drift if we have two ways of dealing with dates and times, datetime and timedelta. There is alot of duplicated code, and that usually means a re-architecture could be considered for better maintainability.

Finding ticks for datetime and timedelta is the same, at least for the first month, so did you consider simply making timedeltas be datetime (ie dt = epoch + timedelta) under the hood (i.e. date2num would just take timedeltas as well as datetimes), and then changing the formatters, and perhaps the locator defaults? For instance, it looks to me like most of the new locators and formatters could be just specialized versions of the old ones with theself.intervald, orformats andoffset_formats changed to not deal with months or larger. Sure there are a few times when you calltimedelta2num instead ofdate2num, but again, I don't see any reasondate2num can't take timedeltas.

@theOehrly
Copy link
ContributorAuthor

I had basically tried to keep date and timedelta separate. But yes, it turned out to have a lot of duplicate code which I hadn't quite expected to this extend. In hindsight it's not that much of a surprise really.
Your ideas and recommendations sound very reasonable. I'm going to work on that.

@jklymak
Copy link
Member

Sure - well keep the old version as it looks like it mostly works! Is there anything api-like we should discuss? I'll try and bring this up at a developer call, probably next week...

As a selling point, I'd suggest turning a few of your tests into graphical examples and at least adding them to the start of the PR. It helps folks know exactly what you have done and see why this is a useful addition.

@theOehrly
Copy link
ContributorAuthor

I don't think I have any api related questions currently but will ask if anything comes up.

Keep the old version? How do you mean that. Should I for now just add addtional classes based on the date locators/formatters? So that there are two implementations for now?

I will create some graphical examples. That's certainly necessary,

@jklymak
Copy link
Member

... I just meant save a branch or make sure its really clear in the commit tree.

@theOehrly
Copy link
ContributorAuthor

Ok, sure

@theOehrly
Copy link
ContributorAuthor

The proposal for using date2num for both date and timedelta works well. Although it's a bit more convoluted than I'd like it to be,
Also I kept the ability to deal with pandas NaT. The separate function for that is only ever called as a last fallback before finally throwing an error if pandas NaT wasn't the problem before. Data that doesn't contain pandas NaT values will never use it.

I do have some issues though with subclassing the AutoDateLocator for timedeltas. If I simply do this...

class AutoTimedeltaLocator(AutoDateLocator):    def __init__(self, minticks=5, maxticks=None):        super().__init__(tz=None, minticks=minticks,                         interval_multiples=True)        self.maxticks = {YEARLY: 0, MONTHLY: 0, DAILY: 11, HOURLY: 12,                         MINUTELY: 11, SECONDLY: 11, MICROSECONDLY: 8}        if maxticks is not None:            try:                self.maxticks.update(maxticks)            except TypeError:                # Assume we were given an integer. Use this as the maximum                # number of ticks for every frequency and create a                # dictionary for this                self.maxticks = dict.fromkeys(self._freqs, maxticks)        self.intervald = {            YEARLY: [1, ],  # cannot be empty            MONTHLY: [1, ],              DAILY: [1, 2, 5, 10, 20, 25, 50, 100, 200, 500, 1000, 2000,                    5000, 10000, 20000, 50000, 100000, 200000, 500000,                    1000000],            HOURLY: [1, 2, 3, 4, 6, 8, 12],            MINUTELY: [1, 2, 3, 5, 10, 15, 20, 30],            SECONDLY: [1, 2, 3, 5, 10, 15, 20, 30],            MICROSECONDLY: [1, 2, 5, 10, 20, 50, 100, 200, 500, 1000, 2000,                            5000, 10000, 20000, 50000, 100000, 200000, 500000,                            1000000],        }        self._byranges = [None, None, None,                          range(0, 24), range(0, 60), range(0, 60), None]

... it seems to sort of work for short ranges of time. Say a few days. But for longer ones I'm not sure how to persuade rrule to use reasonable tick locations.

For example plotting a range of 70 days (95 to 165), this would be nice:
70D_nice

But the rrule based locator creates this:
70D_ADL_bad_start

I don't know how to tell rrule to consider 100 a nice number. It usually goes by days of a month and so on only. Here it seems to simply start somewhere at the beginning of the view limits for lack of a better alternative.

Also, I haven't yet figured out how to make .get_locator() of the AutoDateLocator completely disregard YEARLY and MONTHLY intervals. The above code doesn't work for that. It does not seem possible to completely skip both these frequencies by simply changing the intervals, maxticks and ranges.

I'm not sure whether this can be accomplished without multiple changes to AutoDateLocator and RRuleLocator and whether this can be done within the same methods. (Without making it very complicated). And then I'm no longer sure if this would be better than having completely separate locators.

One more question I have is about the general aim for the number of ticks. Am I right that the AutoDateLocator aims to for as many ticks as possible within maxticks? And minticks is considered the absolute minimum but not a desirable number?
I had previously thought of this exactly the other way round.

Copy link
Member

@jklymakjklymak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Lets get the date2num settled first. I'm not 100% sure I understand the other points. It could indeed be the case that we need special locators and formatters, but hopefully not ;-)

Thanks...

Comment on lines 429 to 432
if np.issubdtype(d.dtype, np.timedelta64):
# numpy converts this to datetime but not the way we want
d = d.astype('timedelta64[us]')
elif not np.issubdtype(d.dtype, np.datetime64):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I don't think you need this - you end up doing it below:

Suggested change
ifnp.issubdtype(d.dtype,np.timedelta64):
# numpy converts this to datetime but not the way we want
d=d.astype('timedelta64[us]')
elifnotnp.issubdtype(d.dtype,np.datetime64):
ifnotnp.issubdtype(d.dtype,np.datetime64):

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

The problem is the following:

>>> d = np.timedelta64(1, 's')>>> d.astype('datetime64[us]')numpy.datetime64('1970-01-01T00:00:00.000001')

This conversion should not accidentally happen. So np.timedelta64 objects should be kept out of that block that deals with converting the non-native numpy objects. This is probably better though:

Suggested change
ifnp.issubdtype(d.dtype,np.timedelta64):
# numpy converts this to datetime but not the way we want
d=d.astype('timedelta64[us]')
elifnotnp.issubdtype(d.dtype,np.datetime64):
ifnot (np.issubdtype(d.dtype,np.datetime64)or
np.issubdtype(d.dtype,np.timedelta64)):

This then requires a minor change when converting timedelta to datetime below.

# numpy converts this to datetime but not the way we want
d = d.astype('timedelta64[us]')
elif not np.issubdtype(d.dtype, np.datetime64):
# handle everything that is not np.timedelta64
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change
# handle everything that is not np.timedelta64
# handle everything that is not np.datetime64

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change
# handle everything that is not np.timedelta64
# handle everything that is not np.datetime64/np.timedelta64

# try conversion to datetime or timedelta
for _dtype in ('datetime64[us]', 'timedelta64[us]'):
try:
d = d.astype(_dtype)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I don't really understand this. It doesn't fail for native numpy. Are you trying to say it fails as pandas? Is that just a pandas bug? Can you provide an example of a pandas NaT series that fails? Because this works fine with pure numpy:

d=np.arange(0,24,dtype='timedelta64[s]')d[20]=np.timedelta64('NaT')d.astype('timedelta64[us]')

If this is really just a pandas bug, I'd kind of prefer they fix it rather than us making more special cases...

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

All of the following work:

np.array((datetime.datetime(2021, 1, 10),)).astype('datetime64[us]')np.array((datetime.timedelta(days=1),)).astype('timedelta64[us]')np.array((pandas.Timestamp('2017-01-01T12:13'),)).astype('datetime64[us]')np.array((pandas.Timedelta(days=1),)).astype('timedelta64[us]')

And we could doisinstance(d[0], datetime.datetime). But we can't do this for the two pandas datatypes as we can't import pandas.#4000 (comment)

Therefore the try... except block. We can't check if the data is Timestamp or Timedelta, so we try it.
This will work for all cases except if the data containspandas.NaT or is completely unsupported. Only then the else block will be used. There's no way to check if an object ispandas.NaT without importing pandas to do that apparentlypandas-dev/pandas#9253 (comment)

Your example withnp.timedelta64 should never reach this part of the code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Sorry I just meant can you provideme with an example that fails? I'm not familiar enough w pandas to know exactly what you are trying to do.

Overall I spent quite a bit of effort simplifying this code. If at all possible it would be nice to keep it simple.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Yes, sorry. I'm not sure I understand what you mean by an example that "fails". Do you mean data that cannot be converted with .astype? Then an example would be:

import pandas as pd# doesn't worktd = (pd.Timedelta(days=1), pd.Timedelta(days=2), pd.NaT)date2num(td)# doesn't work eitherdate = (pandas.Timestamp(year=2020, month=1, day=10),        pandas.Timestamp(year=2020, month=1, day=11),        pd.NaT)date2num(date)

PS: sorry for the delay, I responded (or thought I did) and managed not to respond at the same time...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thanks!

PS: sorry for the delay, I responded (or thought I did) and managed not to respond at the same time...

Happens to me all the time - so many excellent email responses in the Drafts folder for months.

Copy link
ContributorAuthor

@theOehrlytheOehrlyJan 14, 2021
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'm pretty sure that two NaT's won't happen for pandas. I skimmed over the roadmap and a linked proposal for a new type system in pandas. It looks like the plan is to actually only have one "not-a-value" type for everything.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

The value of pandas.NaT is actually defined as the minimum value of numpy's int64. This can be used to identify and convert it. I have tried it and it is arguably much cleaner and certainly more concise than the current converter function I proposed.

What does numpy do? If its the same, then that is totally fine. If its different, I guess there is a tiny risk someone will someday have a very large negative timedelta64, but I agree that risk seems small.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

As far as I can tell, numpy can/does not convert panda's nat.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Sure but numpy has a timedelta64('NAT')

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

At least internally they're doing the same

/* The special not-a-time (NaT) value */#define NPY_DATETIME_NAT NPY_MIN_INT64

jklymak reacted with thumbs up emoji
@theOehrly
Copy link
ContributorAuthor

I've tried to clean this up a bit. Now there's no trial and error involved anymore to figure out dtypes.

The special conversion for pandas.NaT is more concise now. I think that np.iinfo('int64').min should be the same always? So comparing against that is a reliable way to do this in my opinion.

@theOehrly
Copy link
ContributorAuthor

I have two questions. First, is there a reason why conversion is always done todatetime64[us] specifically in date2num and not todatetime64[ms] for example? Because in_dt64_to_ordinalf the values are converted again anyways.

Second, about the locators. If a minimum and a maximum number of ticks is specified and multiple intervals can give a number of ticks within that range, which interval should be chosen? The one that gives more ticks or the one that gives less ticks?

@jklymak
Copy link
Member

I have two questions. First, is there a reason why conversion is always done todatetime64[us] specifically in date2num and not todatetime64[ms] for example? Because in_dt64_to_ordinalf the values are converted again anyways.

[us] is what we guarantee being able to convert to (matches datetime's resolution. If we converted to[ms] we would lose possible precision.

All the fussing in_dt64_to_ordinalf is there to preserve as much floating point as possible. Its possible that could be simplified, but I think I tried and it broke some tests...

Second, about the locators. If a minimum and a maximum number of ticks is specified and multiple intervals can give a number of ticks within that range, which interval should be chosen? The one that gives more ticks or the one that gives less ticks?

I guess more? I think its always better to err on the side of giving he user too many ticks that they can then weed out by adjustment, rather than too few, and they don't know what ticks are there. But I'm not 100% sure. Some of the number of ticks logic is a bit arbitrary.

@theOehrly
Copy link
ContributorAuthor

I do have another question. This time about the RRuleLocator, specifically about this part here

def tick_values(self, vmin, vmax):    delta = relativedelta(vmax, vmin)    # We need to cap at the endpoints of valid datetime    try:        start = vmin - delta    except (ValueError, OverflowError):        # cap        start = datetime.datetime(1, 1, 1, 0, 0, 0,                                  tzinfo=datetime.timezone.utc)    try:        stop = vmax + delta    except (ValueError, OverflowError):        # cap        stop = datetime.datetime(9999, 12, 31, 23, 59, 59,                                 tzinfo=datetime.timezone.utc)

This needs to be done a bit differently for timedeltas to allow ticks on interval multiples. But I want to be sure to fully understand why this is done this way specifically for dates. Why is the delta added/subtracted? Why does the range need to be extended at all? Is it just conevenient to use the delta or is there some very specific reason for that?

@jklymak
Copy link
Member

I don't particularly know. It looks like it is just to set the clipping? Maybe the rule needs to be set wider than thebetween values below? Certainly vmin and vmax need to be capped.

You might try simplifying this in a different PR (i.e. as a smaller change) and see if it breaks any of the tests.

@theOehrly
Copy link
ContributorAuthor

I hadn't intended to change this for dates, but to have a subclass for timedeltas that overwrites this (partly at least). But I'm not sure about that yet. Maybe a cleaner way can be found. I guess I will have to play around with this some more.
The Daily locator for timedelta is a similar special case to the Yearly date locator. And then the Microsecond locator is another special case for both currently. This is not ideal.

@theOehrly
Copy link
ContributorAuthor

I came across two things that are a bit of an issue potentially. Both are cause by converting timedelta to datetime by adding the epoch.

First, this "conversion" means that with the default epoch the largest valid negative timedelta is much smaller than the largest valid positive timedelta. It's somewhat obvious but I only thought of this now. Basically, the range of timedelta is constrained to the maximum datetime value which is 9999 years. We set the zero point at 1970 with the default epoch. Therefore the range of valid timedeltas is -1970 years to 8029 years. This is not very intuitive for a user I think.

The second problem concerns the locators and conversion to floating point values. The existing locators for microseconds, seconds, minutes and hours work fine with timedeltas. But only as long as the epoch is set to a full day..dates.set_epoch also accepts something like "2020-05-07T12:43" as a value for the epoch though. And if you set this, the locators will fail to determine proper tick locations (for timedelta only). I don't know why anyone would do that, but still it can be done.

I thought about having the epoch that is used as an offset not be the epoch that is set by the user. Instead maybe use a fixed offset of "5000-01-01" to fix both these problems. I haven't spend much time trying this yet though. So far, it didn't work well.

Any thoughts on these two things?

@jklymak
Copy link
Member

I have tried to work around datetime's issues a couple of times, but I don't think its worth the effort. As@efiring will point out, python dates are not particularly reliable missing leap seconds and other esoterica over long timescales. I don't think timedeltas on the scale of hundreds of years are particularly useful thing to support except as best we can viadatetime (and reallyrrules limitations). A user who is dealing with 10^6 days likely does not care exactly when the start of the month was 2000 years ago, and most such users typically use non-Gregorian calendars anyway.

The situation before changing the epoch to 1970, which is quite recent, was that we only allowed positive years, and folks have managed. If this were to be truly improved, I think someone would need to write a numpy-version of rrule that allowed much bigger date range.

@theOehrly
Copy link
ContributorAuthor

I certainly prefer if I don't need to work around limitations set by python's datetime. I also don't think that supporting ranges of millions of days is very important. But I wanted to know your opinion on this.

As for not setting the epoch with minutes or seconds. Do you think this can be fixed by simply adding a comment in the documentation for set_epoch. Or should the timedelta locators just always work, even if somebody were set the internal epoch to something like "2020-05-18T21:15:05.125"? I see no reason why that is necessary and why anyone would do that.

@jklymak
Copy link
Member

As for not setting the epoch with minutes or seconds. Do you think this can be fixed by simply adding a comment in the documentation for set_epoch.

Sorry I'm not 100% following here - are strange epochs hard to handle somehow?

@theOehrly
Copy link
ContributorAuthor

It's not about weird epochs but about having no zero point (literally).
The problem occurs because timedelta is treated as date internally.

I'll try to explain it with an example.

EpochData ValueInternal Value
1970-01-01T00:00:00.03 days, 4:15:001970-01-04T04:15:00.0
2021-03-05T16:14:27.03 days, 4:15:002021-03-08T20:29:27.0

The data value is what the user wants to have plotted.
The internal value is what is passed to RRule after adding the epoch for conversion to date.

The values for hours, minutes,... differ between data value and internal value if the epoch is 'weird'.

You can still get ticks every say 15 minutes just fine.
But those ticks will no longer be at 0, 15, 30, 45.
RRule will chose the interval multiples based on the internal value (date).

Locating on days works fine because they need special treatement and cannot use rrules byranges anyways.

Part of the problem is that reusing the date functionalities means that a function is not aware wether it is working with date or timedelta values.

The best idea I had so far is to only allow modification of year, month and day of the epoch.
Or only add a warning to not break backwards compatibility.

Some other ideas I had (none of which I consider good):

  • Use a separate, fixed offset epoch for converting to timedelta.
    Problem: num2date can't know whether it receives a date or timedelta as a number.
    Similarly if a num2timedelta function existed, locators wouldn't know which one to call.

  • Somehow subtract a fixed epoch from the user set epoch to get the offset;
    use this to correct the values at some appropriate point.
    I couldn't make this work in some quick experiments so far.

  • Make all objects aware of whether they are working with dates or timedeltas currently;
    pass that information on separately with the data values.
    This would introduce lots of extra code and make everything complex and difficult to understand.

@jklymak
Copy link
Member

@theOehrly sorry to let this languish. I was just thinking about time delta today and realized this was still here.

It seems you got hung up on strange epochs. Ihope no one is planning to set the epoch to anything other than Jan 1 XXXX T00:00:00.0000. If they do, then they are welcome to whatever breakage they get. If your point is that maybe we shouldn't allow arbitrary epoch values, that is probably true, but it is a little late for that.

dixon-kun reacted with thumbs up emoji

@theOehrly
Copy link
ContributorAuthor

@jklymak Yes, this was just a major unknown here for me. If strange epochs need to be supported, then the whole approach as it currently is, is probably not feasible.
If the requirement can be made that the internal epoch needs to be set to Jan 1 XXXX T00:00:00.0000 then everything should more or less work in its current state. I'd need some more time to look at it again, to tell you more precisely which points still need to be addressed from my point of view (if any).

Everything is still a bit convoluted, though. It's not immediately obvious that timedelta is just treated like a date internally.
There are separate formatters, but locators are shared. Except for the DayLocator which can't be shared between date and timedelta at all. And the AutoTimedeltaLocator needs to be separate, but can be a subclass of the AutoDateLocator to override just some of the logic.
I think this is mostly inherent to the approach of treating timedelta as date. It mostly works without problems, but you get some additional edge cases.

@jklymak
Copy link
Member

Well certainly you could go back to the approach of having a whole newtimedelta unit. Leaving aside the complication of the epoch (again I have little sympathy for random epochs), the issue here are time deltas longer than a week. Perhaps someone should define what they want to have happen if the timedelta is 50 days.

@theOehrly
Copy link
ContributorAuthor

@jklymak The problem is that I also agree with your original concern that the completely separate approach introduces too much duplicate code. I'm not fully convinced by either version to be honest. But currently I don't really have enough time to do some useful work on this.

I don't really understand what you mean by defining what should happen with longer timedeltas. How they are represented? As days, weeks, years? I think days is the largest possible unit that makes sense. Weeks could be used too maybe, but in my opinion that's less intuitive. Everything longer than that doesn't have a fixed length anymore and would therefore be ambiguous.

@jklymak
Copy link
Member

Hi@theOehrly just pinging again about this. If you feel swamped, perhaps I would take it over (keeping you as a co-author)? I think weshould add something like this, but I'd have to play with it some as you have. However, its not urgent, so if you plan to take another pass at it, that would be fine as well.

@theOehrly
Copy link
ContributorAuthor

@jklymak Hi, I actually haven't forgotten about this. I'm currently finishing my bachelor's thesis and should have substantially more time again from early February on. I was thinking about taking another look at this, among other things.

We should probably define a bit clearer what we expect this to achieve, what should be supported and what not. And then I can have a fresh look at it. You'd probably prefer this to be split into smaller chunks, too, instead of one large PR. That might be somewhat tricky. But I guess things like support for pandas NaT without importing pandas could be separated out at least.

@jklymak
Copy link
Member

Great! Just wanted to make sure it was on your radar. I agree that making datetime.timedelta and numpy.timedelta64 work should be the priority. However I don't think the length here is too onerous, it's just the very similar code that seems to say we need a refactor somewhere.

@theOehrly
Copy link
ContributorAuthor

Hi@jklymak, I've just started to look into this again.

You said this some time ago:

[...] the issue here are time deltas longer than a week. Perhaps someone should define what they want to have happen if the timedelta is 50 days.

Thinking a bit about it, I suppose that the largest "unit" that we can support fortimedelta is "weeks". Everything longer just needs to be a multiple of that. A minor problem that I see here is that I'd probably prefer "100 days" over "15 weeks" in most cases. So this should be easily customizable.

If someone wantstimedelta values with ticks at "1 month", "2 months", ... we need to have a reference starting date, so that we can determine how long each specific month actually is. At that point, all values should just bedatetime, in my opinion. Ticks like that could then be achieved with a month locator and a custom tick formatter. I think this is a separate issue, if we wanted to support this natively.

If we can only supporttimedelta units with constant length, then we should really use a subclass of theMultipleLocator, I'd say. So, something closer to what I had originally done.
In my opinion, this reduces the overall complexity and makes the implementation more understandable. Also, in terms of functionality, I can't think of any advantage that the approach with conversion todatetime can offer. There's the disadvantage that it introduces a somewhat arbitrary limit on the largesttimedelta value that we could handle. Granted, with the limit being at a few thousand years, few people would ever run into this problem, but still.

Did you have any more ideas or think of anything that we should consider here?

@theOehrly
Copy link
ContributorAuthor

@jklymak, just pinging you again as a quick reminder that this is still here.

I think we "just" need to determine what the proper way to implement this is. Most importantly for that, we need to agree on what this should be able to achieve. When we have that figured out, I could also try to do a bit of a concept specification where the constraints and expectations are made a bit more clear if you think that'd be helpful (don't know how I'd do that but we could figure that out too).

jklymak reacted with thumbs up emoji

@github-actionsgithub-actionsbot added the status: inactiveMarked by the “Stale” Github Action labelSep 27, 2023
@jklymak
Copy link
Member

Sorry this has languished. I recently wanted this facility, and wished that we had it.@theOehrly are you happy to review this version, or should we start over? If I were starting over, I may just suggest we convert to float, where the float is in the units of the timedelta64 and then just plot as normal floats. But I can also see the appeal of having days/hours/minutes....

@theOehrly
Copy link
ContributorAuthor

theOehrly commentedSep 27, 2023
edited
Loading

@jklymak I can review, update or rewrite this PR if you'd like to have this feature implemented in this way. I should be able to get this done sometime in October, if you want.

I guess the question is, whether you want days/hours/minutes/... or just the float version that you mentioned. I suppose there are a few things to consider here:

  • code complexity and maintenance: prefer floats
  • user preference: difficult to tell, likely differs throughout the user base
  • flexibility and ability to create plots for a non-technical audience: prefer the days/hours/minutes/... version
  • consistency and feature parity: the extend version as well, to match datetime support

I may just suggest we convert to float, where the float is in the units of the timedelta64

One note about this, from my (limited) experience with people who only use Python + Matplotlib as a tool in the scientific community and industry but are not very software affine, they are often very much unaware that there is a difference betweentimedelta64[ns],timedelta64[s], .... So this might lead to some confusion.

I have already created a module that patches full timedelta support into Matplotlib two years ago (see here:Timple). In case that Matplotlib decides to implement the float-only version, I'll still maintain that module for the foreseeable future. It doesn't have many users on its own (I think), but my main projectFastF1 depends on it and a large part of those users indirectly use Timple (with autolocators and autoformatters) because a large part of the data there is Timedelta values.

@jklymak
Copy link
Member

I agree with your summary above.

I added this to the weekly Matplotlib meeting.@theOehrly you are definitely invited to attend if you like.https://hackmd.io/l9vkn_T4RSmk147H_ZPPBA

My thinking for floats versus full datetime-like support is that it is quite easy to make timedelta64 a datetime64 by adding a starting datetime64. Then the users get all the flexibility of our datetime64 support with no added code.

td = np.arange(100, dtype='timedelta64[s]')dt = td + np.datetime64('2000-01-01')

Matplotlib can't do that automatically, because we don't know when the user started their experiment. The difficulty with this in the past has also been that some timedeltas are indeed nanosecond scale, and because we drop from datetime64 to datetime, we lose the resolution.

The advantage of co-ercing float by default is that it is the users getsomething rather than an error. If their data is indeed at the nanoscale, they need never mess with our datetime code. Changing units is also straight forward, if a bit annoying, and is the way users could coerce to float anyhow:

td = td / np.timedelta64(1, 'h')

@theOehrly
Copy link
ContributorAuthor

@jklymak I should be able to participate in the meeting. And I'll try to give this some more thought until then.

@github-actionsgithub-actionsbot removed the status: inactiveMarked by the “Stale” Github Action labelSep 29, 2023
@theOehrly
Copy link
ContributorAuthor

theOehrly commentedOct 23, 2023
edited
Loading

For now, I have only rebased onto the current main. Nothing else has been changed so far.

Also, here's the answer to one of the questions that came up during the meeting, about how this is handled by Pandas at the moment. It was noted that their implementation for timedelta plotting is very concise. Here's the reason:

  • Pandas does not use nice tick intervals for Timedelta

    Click to show
  • How their implementation works:

    • All timedelta-like types are already converted topandas.Timedelta in the
      DataFrame or Series. Therefore, their plotting code only needs to handle this
      one data type for timedelta-like values.
    • Pandas passes the timedelta values to matplotlib as numpy arrays with
      dtypetimdelta64[ns].
    • Matplotlib'sAutoLocator is used which explains the bad tick locations.
    • Pandas registers a custom Timedelta formatter that creates a tick string as 'D days HH:MM:SS.F'.
      Hours, minutes, seconds are always shown. Days are only shown when necessary.
      Microseconds are only shown when necessary. The formatter automatically determines the number of
      significant digits that needs to be shown. The formatter is not adjustable by
      the user.

    Overall, their implementation is not as capable or adjustable as the implementation we'd be aiming for.

  • Important: Adding our new implementation breaks pandas' Timedelta formatting.
    It doesn't throw an error, but the tick labels are incorrect. Here we might need to
    coordinate with Pandas, given the size of their userbase.

@theOehrly
Copy link
ContributorAuthor

@jklymak update on the progress of this PR

The implementation of all features is done. As far as features are comparable, there is basically feature parity with the datetime support. Tests are all done, from my point of view. Coverage is near 100%. This part is ready for review. If you have time, you can or someone else can already have a look at the code and tests.

Documentation is still a bit of a work in progress. I've started to restructure the whole documentation formatplotlib.dates slightly. Some feedback here would be appreciated.
All classes/methods are documented at the moment, but there is still some room for improvement. Some small examples in the docstrings still need to be added. Also, I will go over everything again and maybe improve how some things are written. There are no gallery examples so far, this still is TBD.

I haven't added any type hints so far. Type hints through.pyi files seem to be (fairly) new in Matplotlib? Most modules seem to have a.pyi file, but notmatplotlib.dates. Can you give me some guidance here on what to do/what is expected? Should I add type hints for the whole module?

String formats for theConciseTimedeltaFormatter maybe still need to be discussed. I'm not entirely happy with the intuitive readability of the labels. Given that this is quick to change at any point, I'd wait discussing this until I have created a gallery example that shows off this formatter. Then there will be a visual example based on which we can discuss readability of the labels.

The partially custom formatting codes for timedelta are done. They are a subset with extensions of the standard datetime formatting codes. The supported codes are fully documented, with notes where they differ from the datetime standard. Some feedback here is appreciated as well.

We discussed that locating ticks on weeks is not desired in most cases, and that we will stick with days as the largest unit. Accordingly, I've removed weekly locating from the auto-locators. It is still available to be manually configured through the manualTimedeltaLocator because why not. This way, a user can still easily get weekly ticks when actually desired.

It was suggested, that the timedelta support is marked as preliminary/experimental part of the API for now. How would you like this to be represented in the documentation? Should I add a..note:: or..warning:: for every new function and class? I fear that this might be a bit heavy visually. On the other hand, we wouldn't want this to be easily missed by a user. Is there a better alternative?

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@jklymakjklymakjklymak left review comments

At least 1 approving review is required to merge this pull request.

Assignees
No one assigned
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

3 participants
@theOehrly@jklymak@dstansby

[8]ページ先頭

©2009-2025 Movatter.jp