Movatterモバイル変換


[0]ホーム

URL:


Skip to main content
Springer Nature Link
Log in

Generalized Hamming Distance

  • Published:
Information Retrieval Aims and scope Submit manuscript

Abstract

Many problems in information retrieval and related fields depend on a reliable measure of the distance or similarity between objects that, most frequently, are represented as vectors. This paper considers vectors of bits. Such data structures implement entities as diverse as bitmaps that indicate the occurrences of terms and bitstrings indicating the presence of edges in images. For such applications, a popular distance measure is the Hamming distance. The value of the Hamming distance for information retrieval applications is limited by the fact that it counts only exact matches, whereas in information retrieval, corresponding bits that are close by can still be considered to be almost identical. We define a “Generalized Hamming distance” that extends the Hamming concept to give partial credit for near misses, and suggest a dynamic programming algorithm that permits it to be computed efficiently. We envision many uses for such a measure. In this paper we define and prove some basic properties of the “Generalized Hamming distance”, and illustrate its use in the area of object recognition. We evaluate our implementation in a series of experiments, using autonomous robots to test the measure's effectiveness in relating similar bitstrings.

Article PDF

Similar content being viewed by others

Explore related subjects

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.
Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

  • Adriaans P and Zantinge D (1996) Data Mining. Addison Wesley Longman Ltd, Harlow, England.

    Google Scholar 

  • Boykov Y and Huttenlocher D (1999) A new Bayesian framework for object recognition. In: proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE Computer Society.

  • Bookstein A and Klein ST (1990) Information retrieval tools for literary analysis. In: Tjoa AM, Ed., Database and Expert Systems Applications. Springer-Verlag, Vienna, pp. 1–7.

    Google Scholar 

  • Bookstein A and Klein ST (1991) Compression of correlated bit-vectors. Information Systems, 16:387–400.

    Google Scholar 

  • Bookstein A, Klein ST and Raita T (1998) Clumping properties of content-bearing words. Journal of the American Society for Information Science, 49:102–114.

    Google Scholar 

  • Bookstein A, Klein ST and Raita T (2001) Fuzzy hamming distance: A new dissimilarity measure. In: Amir, Amihood and Landau, G Eds., Proceedings: 12th Annual Symposium on Combinatorial Pattern Matching (CPM2001). Jerusalem, Israel, 1-4 July, 2001.

  • Chang P and Krumm J (1999) Object recognition with color cooccurrence histograms. In: proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE Computer Society.

  • Cormen TH, Leiserson CE and Rivest RL (1990) Introduction to Algorithms. MIT Press, Cambridge, MA.

    Google Scholar 

  • Crochemore M and Rytter W (1994) Text Algorithms. Oxford University Press, New York.

    Google Scholar 

  • Doyle L (1961) Semantic road maps for literature searchers. Journal of the ACM, 8(4):553–578.

    Google Scholar 

  • Hamming RW (1980) Coding and Information Theory. Prentice-Hall, Englewood Cliffs, NJ.

    Google Scholar 

  • Hearst MA (1994) Multi-paragraph segmentation of expository text. In: proc. ACL Conf. Las Cruces.

  • Hearst MA and Plaunt C (1993) Subtopic structuring for full-length document access. In: proc. 16-th ACM-SIGIR Conf. Pittsburgh, pp. 59–68.

  • Jansen BJ, Spink A, Bateman J and Saracevic T (1998) Real life information retrieval: A study of user queries on the web. SIGIR Forum, 32(1):5–18.

    Google Scholar 

  • Knuth DE (1973) The Art of Computer Programming, Vol I, Fundamental Algorithms, Addison-Wesley, Reading, MA.

    Google Scholar 

  • Kulyukin V and Burke A (in press) Mining free text for structure. In: wang J, Ed., Data Mining: Opportunities and Challenges, Idea Group Publishing, Pensylvania.

  • Kulyukin V and Morley N (2002) Integrated object recognition in the three-tiered robot architecture, In: Proc. of the International Conference on Artificial Intelligence, Las Vegas.

  • Kulyukin V and Zoko A (2000) Hamming distance for object recognition by mobile robots. In: Proceedings of the Research Symposium of the DePaul University School of Computer Science, Telecommunications and Information Systems (CTIRS-2000). DePaul University.

  • Murphy R (2000) AI Robotics. MIT Press, New York.

    Google Scholar 

  • Parker, JR (1993) Practical Computer Vision Using C. John Wiley and Sons, New York.

    Google Scholar 

  • Roman S (1992) Coding and Information Theory. Springer-Verlag, New York.

    Google Scholar 

  • Sankoff D and Kruskal JB (1983) Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison. Addison-Wesley, Reading, MA.

    Google Scholar 

  • Swain MJ and Ballard DH (1991) Color Indexing. International Journal of Computer Vision, 7:11–32.

    Google Scholar 

  • Tam AM and Leung CHC (2001) Structured natural-language descriptions for semantic content retrieval of visual materials. Journal of the American Society for Information Science and Technology, 52(11):930–937.

    Google Scholar 

  • Witten IH, Moffat A and Bell TC (1999) Managing Gigbytes. Morgan Kaufmann Publishers, San Fransisco.

    Google Scholar 

  • Young SS, Scott PD and Nasrabadi N (1994) Multi-layer hopfield neural network. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recogntion. IEEE Computer Society.

Download references

Author information

Authors and Affiliations

  1. Center for Information and Language Studies, University of Chicago, Chicago, IL, 60637

    Abraham Bookstein

  2. Computer Science Department, Utah State University, Logan, Utah, 84322-4205

    Vladimir A. Kulyukin

  3. Computer Science Department, University of Turku, 20520, Turku, Finland

    Timo Raita

Authors
  1. Abraham Bookstein
  2. Vladimir A. Kulyukin
  3. Timo Raita

Rights and permissions

About this article

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Advertisement


[8]ページ先頭

©2009-2025 Movatter.jp