NotificationsYou must be signed in to change notification settings
Fork7.9k
Star21.4k

Re-write sym-log-norm#16391

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Closed

dstansby wants to merge8 commits intomatplotlib:masterfromdstansby:symlog-overhaul

Closed

Re-write sym-log-norm#16391

dstansby wants to merge8 commits intomatplotlib:masterfromdstansby:symlog-overhaul

Conversation

Copy link

Member

dstansby commentedFeb 2, 2020•
edited
Loading

This is work I started a while ago to re-write SymLogNorm into code I could understand and read. As I wrote this I realised that the original code was just plain wrong... I have now added tests include values that are easy to manually calculate and verify. Suggestions for more tests welcome.

Fixes#16376

dstansby added this to thev3.2.0 milestone

Feb 2, 2020

Re-write sym-log-norm

73ae1d3

dstansby force-pushed thesymlog-overhaul branch from9b81d13 to73ae1d3Compare

February 2, 2020 20:43

Copy link

Member

jklymak commentedFeb 2, 2020

Suggest we let the base be an option. We will need to know the base if we make a scale for the colorbar

The docstring probably needs to change as well. The linear region using is not exactly two decades which I think is done to make the derivative continuous.

I was working on this and found the existing test should be improved to actually test some numbers we could calculate by hand rather than three random numbers.

I’m happy for you to work on this but I would like to see substantive improvements to the tests and the documentation.

After we fix this we can get back to fixing the broken colorbars which I’d propose fixing by associating a scale with the norm. Any norms that don’t have a scale could fallback to the manual ticking.

Copy link

Member

jklymak commentedFeb 2, 2020

This was done with a modified SymLogNorm that takes a base. So for sure, using base = np.e is different than base=10. Essentially a "decade" in np.e is smaller than one in base 10, and so the linear region is smaller.

Note however, that even the base=10 case does not yield equally spaced "ticks".

#scale = mscale.SymmetricalLogScale(ax.xaxis, linthreshx=1.0)fig, ax =plt.subplots()ax.set_xscale('symlog', linthreshx=1.0, basex=10)ax.plot(np.arange(-100, 100, 0.1), np.arange(-100, 100, 0.1))ax.set_xlim([-100, 100])scale = ax.xaxis._scalefig, ax = plt.subplots()for normbase in [np.e, 10]:    norm = mcolors.SymLogNorm(linthresh=1.0, linscale=1, vmin=-100, vmax=100, base=normbase)    xx = scale._transform.transform([-200, -100, -10, -1, -0.5, 0, 0.5, 1, 10, 100, 200])    print('xx', (xx - xx[1])/(xx[-2] - xx[1]))    nn = norm([-200, -100, -10, -1, -0.5, 0, 0.5, 1, 10, 100, 200])    print('nn', (nn - nn[1])/(nn[-2] - nn[1]))    print(norm([-200, -100, -10, -1, -0.5, 0, 0.5, 1, 10, 100, 200]))    ax.plot(nn, xx, '.', label=f'normbase {normbase:1.3f}')    ax.plot(norm([-100, 100]), xx[[1, -2]], zorder=0)

dstansby added3 commits

February 3, 2020 10:18

Allow specifying base

943a848

Disallow vmin!=vmax

60f0d69

Update tests

6fdd2e0

dstansby commented

Feb 3, 2020

View reviewed changes

lib/matplotlib/tests/test_colors.py

		vals = np.array([-30, -1, 2, 6], dtype=float)
		normed_vals = norm(vals)
		expected = [0., 0.53980074, 0.826991, 1.02758204]
		expected = [-0.842119, 0.450236, 0.599528, 1.277676]

Copy link

MemberAuthor

dstansbyFeb 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think these values were just plain wrong before...

The first value (-30) is less thanvmax (5), so should come out as less than zero
The second value (-1) is less than 0, so should come out as less than 0.5

dstansby added2 commits

February 3, 2020 13:38

Add base docs

cc1fa93

Add API change

8904654

dstansby added the topic: color/color & colormaps label

Feb 3, 2020

dstansby added2 commits

February 3, 2020 13:56

Check a different base

3fb1dcf

Fix color test

82052d8

Copy link

Contributor

greglucas commentedFeb 3, 2020

The problem I see here is that the code found inscale for SymLog is basically going to be the exact same as the code incolors for SymLog because the math is the same. This is where I think taking a deep look at where the math should be placed would be helpful.

My thought is that there should be one, and only one, mathTransform for SymLog. This Transform is essentially just a functiony = f(x).

Scales could inherit from thatTransform to identify/place ticks.
Norms could inherit from thatTransform to set limits and scale values to be between 0 and 1.
A Norm could also inherit from the Scale if that's what is needed for colorbars because the Scale would have the transform already. Regardless, I think it would be helpful to link these items closer together in implementation.

This would guarantee consistency between the underlying math function chosen.

Copy link

Member

jklymak commentedFeb 3, 2020

@anntzer is working towards that. We could also change this to useSymmetricLogTransform as a stopgap until the general solution. But we really should fix this ASAP as folks may be publishing nonsense using the Norm and not realizing it. A proper refactorcould wait until after that.

Copy link

MemberAuthor

dstansby commentedFeb 3, 2020

I think this is ready for review. Particularly interested in opinions on

Forcingvmin = -vmax. I have no idea what to do if this isn't the case, so for now I'm just enforcing it.
Any extra tests that could/should be added.

jklymak reviewed

Feb 3, 2020

View reviewed changes

Copy link

Member

jklymak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This implementation isdifferent than what we had before, even forbase=np.e. I don't have any skin in the game whether this one isbetter orworse but it needs more than this API note.

It is also inconsistent withSymmetricalLogTransform, and henceSymmetricalLogScale, so that needs to be cleaned up or we can't make a proper axes for it when it gets turned into a colorbar. My understanding is that the current implementation has a smooth derivative at the transition. Does this new implementation? Do we care about that? Not sure we do...

Overall, I think this requires its own gallery page clearly explaining the properties of the transform, or cite a reference that explains exactly what the algorithm here is. People need to be able to cite what this thing is in their papers.

lib/matplotlib/tests/test_colors.py

		normed_vals = norm(vals)
		assert_array_almost_equal(normed_vals, expected)
		expected = [0, 0.25, 0.5, 0.75, 1]

Copy link

Member

jklymakFeb 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I don't think this is consistent withSymmetricLogTransform yet. I'm OK if we want to change that as well to be consistent with this code, but they can't be inconsistent!

importmatplotlib.scaleasmscaletrans=mscale.SymmetricalLogTransform(10,1,1)new=trans.transform([-10,-1,0,1,10])new= (new-new[0])/ (new[-1]-new[0])print(new)

[0.0 0.23684210526315788 0.5 0.7631578947368421 1.0]

I'm fine if this implementation is desired, but then we need to changeSymmetricLogTransform.

Copy link

MemberAuthor

dstansby commentedFeb 3, 2020

Thanks for the comments. re.

My understanding is that the current implementation has a smooth derivative at the transition

I'm not sure what this means or how it could be implemented? Totally agreed that there should be a gallery example going through this carefully.

dstansby added the status: work in progress label

Feb 3, 2020

Copy link

Member

jklymak commentedFeb 3, 2020

I'm not sure what this means or how it could be implemented?

I mean, I think thats why the current version is the way it is.https://iopscience.iop.org/article/10.1088/0957-0233/24/2/027001
seems the same and thats why they claim they do it.

Copy link

MemberAuthor

dstansby commentedFeb 3, 2020

Ah nice - thoughts on using the approach in that paper (open accesshere) instead of (what I am interpreting as) the current approach?

I think I'm pro using the published one because

It's a published thing to link to
It has a simple analytic form

Copy link

Member

jklymak commentedFeb 3, 2020

Well, I prefer yours because its simpler and does what we say it should. But open to using either so long as its explained clearly...

Copy link

Member

jklymak commentedFeb 4, 2020

@dstansby we discussed this on the call; something like this could/should go in v3.3. For 3.2, we need to add a base kwarg, and deprecate np.e as the base so the default is consistent with our docs and what most folks have probably been assuming. I'll open a PR for that right now.

jklymak mentioned this pull request

Feb 4, 2020

FIX: add base kwarg to symlognor#16404

Merged

6 tasks

Copy link

Contributor

anntzer commentedFeb 4, 2020

I wonder whether we really need to support a symlog norm and whether we could consider deprecating it instead. We could also consider moving it to a separate package, similar to mpl-probscale.
Especially now that we have FuncScale, it is "relatively" easy for the user to reimplement it if they really need it; conversely, it is a very non-standard scale that we should probably not be encouraging people to use. Note that the linked paper (which is also cited by the Matlab Central implementation of symlog) actually cites matplotlib as its first example for symlog, so it's a bit circular for matplotlib to refer back to it to justify the symlog scale :-) (even though I agree that ensuring continuity in the gradient seems a reasonable criterion). Also, that paper has been cited a grand total of 19 times in 7 years, which is notnothing but hardlya lot for a paper proposing something as fundamental as a new plot scaling function (the number of log-scale or power-law plots published every year is certainly orders of magnitude more than that).
If you really have "log-scaled" data with both signs, you can e.g. plot the absolute value in log scale, using e.g. different colors for the positive and negative sides. See e.g. the log(Gamma) plot by Wolfram (certainly a reasonable reference point...) athttp://functions.wolfram.com/GammaBetaErf/LogGamma/introductions/Gammas/ShowAll.html, which effectively plots log(abs(Gamma)) and separately the imaginary part.

Copy link

MemberAuthor

dstansby commentedFeb 4, 2020

@dstansby we discussed this on the call; something like this could/should go in v3.3. For 3.2, we need to add a base kwarg, and deprecate np.e as the base so the default is consistent with our docs and what most folks have probably been assuming. I'll open a PR for that right now.

Sure; regardless of cosmetic changes I still think the maths is throwing out incorrect numbers:#16391 (comment) If we're not going to overhaul in 3.2, I think we should put a big warning on it that the function is un-tested and suspected wrong.

Copy link

Member

jklymak commentedFeb 4, 2020

Judging by the tests, though, I don't think the original authors thought it was spitting out the wrong maths. Unfortunately they failed to write down what they thought the correct maths are.

That all said, overall I agree with@anntzer. This seems to be something we invented, versus something the scientific community is asking for. If it could be made its own package that would be fabulous.

But if we do persist in providing it in core, it has to be well-defined and documented. At the very least, the cited paper does that.

Copy link

Contributor

greglucas commentedFeb 4, 2020

I think that everyone brings up really good points about the use of this function and that it is very subjective as to what different people would want out of it (or whether they should be using it at all).
@anntzer, I agree with your point about plotting the magnitude in log space with two different colorbars for +/-. That is exactly what I want out of this function :) this is a convenient wrapper around that to combine with a diverging colormap. It is purely a convenience for me and used to make a subjective figure that gets across the idea that we have positive and negative data spanning orders of magnitude. I'm not opposed to deprecating the function, but I also think it is quite convenient to use.

My specific use-case... I have a vector field that spans orders of magnitude and then I dot that into some other vector path (integral ofv dotdx) to get a scalar field that has positive and negative values (spanning orders of magnitude) that depend on the direction of your integration path (dx). For the vector field, I write my own symlog vector scaling function that preserves angles, and I somewhat arbitrarily use a form close to what the referenced paper proposes.

def scale_vectors(x, y, scale=np.log10):    """Scale vectors while preserving the angle.        Parameters    ==========    x: Cartesian x coordinate(s)    y: Cartesian y coordinate(s)        scale: function to scale the magnitude by (Default: log10)"""    mag = np.sqrt(x**2 + y**2)    angle = np.arctan2(y, x)    newx = scale(1 + mag)*np.cos(angle)    newy = scale(1 + mag)*np.sin(angle)        return (newx, newy)

(note that I don't think the linked paper will work with vector quantities even though they say it is bisymmetric because of the changing angles, but I didn't read it that closely).

Copy link

Contributor

anntzer commentedFeb 4, 2020

Given that you already wrote your custom normalization (which is certainly the "responsible" thing to do), I think what Matplotlib should do is really just making sure that you can easily use it as scale/norm, not providing its own symlog?

Copy link

Contributor

greglucas commentedFeb 4, 2020

Yes, I agree, and your previous point about using my ownFuncNorm would certainly be the way to go for that. This is more for exploratory data analysis/ease of use. I'm really not opposed to getting rid of it from MPL, but since it is there and convenient, I currently use it.

Copy link

Member

jklymak commentedFeb 4, 2020

Back in the olden days of Matlab, we had one Norm and weliked it, and we just plotted our data using that linear norm. i.e. you transform the array first and then pcolor it. Surely thats good enough for exploratory data analysis:

X=myData()Y=mytransform(X)pcolormesh(Y)

Copy link

Contributor

greglucas commentedFeb 4, 2020

I don't disagree... ¯_(ツ)_/¯

The nicety in mpl is now adding to your simple example:colorbar(), which will show -5 to 5 and in my head I'll have to convert that scale or call invtransform on the label strings myself (rather than having mpl just automatically put in -10^5 to 10^5 for me)

Again, I'm not opposed to deprecation, and I think most people are in agreement that this is a pretty niche norm.

Copy link

Member

jklymak commentedFeb 4, 2020

Fully agreed, the point of using a Norm is that thecolorbar gives you proper numbers. But that means that each norm should have an associated scale, or the advantage is pretty much nullified.

Copy link

Member

jklymak commentedFeb 6, 2020

@anntzer, do you have more details on the proposed unification of scales and norms? I think this should be done ASAP so that most norms have a scale associated with it that the colorbar can just use. Will your proposed factory give us that?

Copy link

MemberAuthor

dstansby commentedFeb 6, 2020

In the meantime, are we decided on deprecating and removing symlognorm? I am happy to do this, with extensive documentation on alternatives.

Copy link

Member

jklymak commentedFeb 6, 2020

Well maybe a quick poll and try to get main developers to vote?

👍 keep SymLogNorm and symlogscale
👎 deprecate both

Copy link

Contributor

anntzer commentedFeb 6, 2020

My patch makes it possible to derive norms from scales (in which case the scale can known about the parent norm). Obviously you can still construct independent norms (e.g. BoundaryNorm) so notall norms will have an associated scale.

I think this should be done ASAP

Well,#14916 is not exactly recent :)

Copy link

Contributor

Tillsten commentedFeb 6, 2020

If you break symlog scale you break a lot of code I and a lot of other people in time-resolved spectroscopy use, please don't do that. Matplotlib is generally very careful about breaking code, so I don't see why this is not the case for scales.

As the original author of the retrospective faulty SymlogNorm, I don't care too much about it anymore. I fully agree with the found deficiencies, but my usage at the time of coding does not depend on exactly reproducing the absolute data form a colormap. Instead of the relative amplitude of the values is importent, something which is in some cases better reproduced by a symlog-scale than a linear scale. Note that the data has both positive and negative signals and varies quite a lot in its amplitude. Hence I never cared about the base, since the final differences in the map were not that visible since it the scale is normalized. Again, I retrospect this was wrong.

Copy link

rayosborn commentedFeb 8, 2020

I have only just noticed this PR and haven't had time to understand all the issues involved, but I would like to strongly urge (aka beg) that SymLogNorm is not deprecated. A new x-ray scattering method called ΔPDF generates both positive and negative probability maps, which are ideally viewed in symmetric (often log) plots, e.g., seeKrogstad, M. J. et al. Nat Mater 19, 63–68 (2020). We use MPL to plot with symmetric limits that are automatically enforced inNeXpy by choosing a divergent color map. If this is a niche, I think that it will be a growing one, since we are already collaborating with a number of research groups to produce ΔPDF data on a routine basis.

Copy link

Member

jklymak commentedFeb 9, 2020

@rayosborn as discussed on gitter, wouldarcsinh suit your needs? Not saying we are going that way, but its another option.

Copy link

Contributor

greglucas commentedFeb 9, 2020•
edited
Loading

I actually like thearcsinh suggestion, since it is a "simple" and explainable transform (although my guess is people still don't know what arcsinh is off the top of their head, I had to look it up myself:ln(z + np.sqrt(1+z^2)))

One other option I haven't seen mentioned yet would be removing the linear region completely and just making athreshold instead. This would be very simple to explain, log (of whatever base you want) everywhere. Withabs(x) < threshold getting mapped to the midpoint of the scale. This would be similar to using a log scale currently where you specify a vmin and everything less than that gets clipped to the minimum value.

Edit: I just implemented this and realized it works fine for colorbar normalization, but not for x/y plots with symlog scales due to the clipping creating hard cutoffs rather than smoother transitions.

Copy link

rayosborn commentedFeb 10, 2020

I'm not too bothered about what method is used to generate the symmetric "log" plots, since we don't analyze the images themselves. They are used to guide our interpretation and to present the data in talks and publications. If we fit any models, it is to the actual data, not the plotted representation.

Copy link

Member

jklymak commentedFeb 10, 2020

My concern is reproducibility. I’ve hand digitized a good number of Figures from papers where the data was no longer available, and if the scale was not clearly defined in the paper it would lead to errors.