Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
This repository was archived by the owner on Aug 28, 2022. It is now read-only.
/alphanumPublic archive

copy for github

License

NotificationsYou must be signed in to change notification settings

aiekick/alphanum

Repository files navigation

This readme is a copy of the pagehttp://davekoelle.com/alphanum.html

The Alphanum Algorithm

People sort strings with numbers differently than software does.Most sorting algorithms compare ASCII values, which produces an ordering that is inconsistent with human logic. Here's how to fix it.

the available algorithms implmentations :

  • Java: AlphanumComparator.java
  • C#: AlphanumComparator.cs
  • C++: alphanum.cpp
  • C++, not Windows dependent: alphanum.hpp
  • #"auto">License: MIT License - Free to use and distribute

    Special thanks to everyone who contributed fixes or new code!

    Use at your own risk... I can personally vouch only for the Java version

    The Problem

    Look at most sorted list of filenames, product names, or any other text that contains alphanumeric characters - both letters and numbers. Traditional sorting algorithms use ASCII comparisons to sort these items, which means the end-user sees an unfortunately ordered list that does not consider the numeric values within the strings.

    For example, in a sorted list of files, "z100.html" is sorted before "z2.html". But obviously, 2 comes before 100!

    Sorting algorithms should sort alphanumeric strings in the order that users would expect, especially as software becomes increasingly used by nontechnical people. Besides, it's the 21st Century; software engineers can do better than this.

    The Solution

    I created the Alphanum Algorithm to solve this problem. The Alphanum Algorithm sorts strings containing a mix of letters and numbers. Given strings of mixed characters and numbers, it sorts the numbers in value order, while sorting the non-numbers in ASCII order. The end result is a natural sorting order.

    Here's a list of sample filenames to illustrate the difference between sorting with the Alphanum algorithm and traditional ASCII sort. On the left is what you live with on a daily basis. On the right is what you could have, if more developers were motivated to sort lists as people would expect. Which list makes more sense to you? Which would be more comfortable to you as you're using an application?

    TraditionalAlphanum
    z1.docz1.doc
    z10.docz2.doc
    z100.docz3.doc
    z101.docz4.doc
    z102.docz5.doc
    z11.docz6.doc
    z12.docz7.doc
    z13.docz8.doc
    z14.docz9.doc
    z15.docz10.doc
    z16.docz11.doc
    z17.docz12.doc
    z18.docz13.doc
    z19.docz14.doc
    z2.docz15.doc
    z20.docz16.doc
    z3.docz17.doc
    z4.docz18.doc
    z5.docz19.doc
    z6.docz20.doc
    z7.docz21.doc
    z8.docz22.doc
    z9.docz23.doc

    How does it work?

    The algorithm breaks strings into chunks, where a chunk contains either all alphabetic characters, or all numeric characters. These chunks are then compared against each other. If both chunks contain numbers, a numerical comparison is used. If either chunk contains characters, the ASCII comparison is used.

    There is currently a glitch when it comes to periods/decimal points - specifically, periods are treated only as strings, not as decimal points. The solution to this glitch is to recognize a period surrounded by digits as a decimal point, and continue creating a numeric chunck that includes the decimal point. If a letter exists on either side of the period, or if the period is the first or last character in the string, it should be viewed as an actual period and included in an alphabetic chunk. While I have recently figured this out in theory, I have not yet implemented it into the algorithms. To be truly international, the solution shouldn't just consider periods, but should consider whatever decimal separator is used in the current language.

    Currently, the algorithm isn't designed to work with negative signs or numbers expressed in scientific notation, like "5*10e-2". In this case, there are 5 chunks: 5, *, 10, e-, and 2.

    The latest version of some of the code (particularly the Java version) compares numbers one at a time if those numbers are in chunks of the same size. For example, when comparing abc123 to abc184, 123 and 184 are the same size, so their values are compared digit-by-digit: 1=1, 2<8. This was done to solve the problem of numeric chunks that are too large to fit in range of values allowed by the programming language for a particular datatype: in Java, an int is limited to 2147483647. The problem with this approach is doesn't properly handle numbers that have leading zeros. For example, 0001 is seem as larger than 1 because it's the longer number. A version that does not compare leading zeros is forthcoming.

    Conclusion

    Software development has matured beyond the point where simply sorting strings by their ASCII value is acceptable. It is my hope that the Alphanum Algorithm becomes adopted by all developers so we can work together to create software applications that make sense to users. Feel free to download and share the algorithm, place it in your program free of charge, and help spread the word.

    Epilogue: Let's see another example!

    Here's an example using fictitious product names. Imagine you're developing an application for a customer, and you need to instill a sense of confidence and professionalism in your product line. Which sorted list would you most associate with those feelings?

    Traditional SortAlphanum
    1000X Radonius Maximus10X Radonius
    10X Radonius20X Radonius
    200X Radonius20X Radonius Prime
    20X Radonius30X Radonius
    20X Radonius Prime40X Radonius
    30X Radonius200X Radonius
    40X Radonius1000X Radonius Maximus
    Allegia 50 ClasteronAllegia 6R Clasteron
    Allegia 500 ClasteronAllegia 50 Clasteron
    Allegia 50B ClasteronAllegia 50B Clasteron
    Allegia 51 ClasteronAllegia 51 Clasteron
    Allegia 6R ClasteronAllegia 500 Clasteron
    Alpha 100Alpha 2
    Alpha 2Alpha 2A
    Alpha 200Alpha 2A-900
    Alpha 2AAlpha 2A-8000
    Alpha 2A-8000Alpha 100
    Alpha 2A-900Alpha 200
    Callisto MorphamaxCallisto Morphamax
    Callisto Morphamax 500Callisto Morphamax 500
    Callisto Morphamax 5000Callisto Morphamax 600
    Callisto Morphamax 600Callisto Morphamax 700
    Callisto Morphamax 6000 SECallisto Morphamax 5000
    Callisto Morphamax 6000 SE2Callisto Morphamax 6000 SE
    Callisto Morphamax 700Callisto Morphamax 6000 SE2
    Callisto Morphamax 7000Callisto Morphamax 7000
    Xiph Xlater 10000Xiph Xlater 5
    Xiph Xlater 2000Xiph Xlater 40
    Xiph Xlater 300Xiph Xlater 50
    Xiph Xlater 40Xiph Xlater 58
    Xiph Xlater 5Xiph Xlater 300
    Xiph Xlater 50Xiph Xlater 500
    Xiph Xlater 500Xiph Xlater 2000
    Xiph Xlater 5000Xiph Xlater 5000
    Xiph Xlater 58Xiph Xlater 10000

    Links from Blogs

    Even though I wrote this algorithm in 1997 (good thing algorithms are timeless!), it wasn't until December 2007 that this page started to be spread by and talked about on a couple of blogs.

    Blogs and sites that have linked to this page, each of which have discussion threads and other links that you may find useful:


[8]ページ先頭

©2009-2025 Movatter.jp