Thread Rating:
  • 1 Vote(s) - 5 Average
  • 1
  • 2
  • 3
  • 4
  • 5
[Tutorial] Fast Identification
#1
This is an updated version of an article I originally posted on my website.

Introduction
Often you want to be able to refer to something in an application by name, say you had a scene-graph of nodes and wanted to be able to find a node by it's name for simplicity. The problem with this is that a string compare is a relatively slow operation, and when performance is key, should be avoided. However there is a way to be able to refer to something by name, but still do so in a timely manner; this tutorial will guide you through creating a class which uses a hashed string to make comparing a string almost as fast as comparing an integer.

Hashing a String
Hashing a string involves converting it from it's lexical form, into an integral form which should hopefully uniquely represent the string within the context that you need it to. There are a load of ways to hash a string, CRC and MD5 are just to name two; if you have a compiler with TR1 support, there is even a hash function in that. My preferred method of hashing a string in C++ is to use the MurmurHash2 algorithm by Austin Appleby (the code for which can be found here) and is the method that I will use in this tutorial to hash the string.

The Identification Class
The identification class itself is really simple, all it needs is two member variables (one for the string, and one for the hash), a function to get/set the string and another to get the hash, and finally some overloaded comparison operators to make it easier to compare the identification objects. The header for the class is shown below.
Code:
#ifndef CIDENT_H
#define CIDENT_H

#include <string>

//! Class to identify other objects using a hashed string for optimal comparisons
class CIdent
{
    public:
                                //! Constructor
                                CIdent();

                                //! Constructor
                                CIdent(
                                    const std::string &str            //!< String to set from
                                    );

                                //! Constructor
                                CIdent(
                                    const char *const str            //!< String to set from
                                    );

                                //! Destructor
                                ~CIdent();

                                //! Set the identifier from a string
        void                    setIdent(
                                    const std::string &str            //!< String to set from
                                    );

                                //! Comparison operator (compares hashes)
                                //! \return true if the hashes match, false otherwise
        inline bool                operator==(
                                    const CIdent &other                //!< Identifier to compare against
                                    ) const { return (m_hash == other.m_hash); }

                                //! Comparison operator (compares hashes)
                                //! \return true if the hashes match, false otherwise
        bool                    operator==(
                                    const std::string &str            //!< String to compare against
                                    ) const;

                                //! Comparison operator (compares hashes)
                                //! \return true if the hashes match, false otherwise
        inline bool                operator==(
                                    const unsigned int hash            //!< String to compare against
                                    ) const { return (m_hash == hash); }

                                //! Comparison operator (compares hashes)
                                //! \return true if the hashes don't match, false otherwise
        inline bool                operator!=(
                                    const CIdent &other                //!< Identifier to compare against
                                    ) const { return (!(*this == other)); }

                                //! Comparison operator (compares hashes)
                                //! \return true if the hashes don't match, false otherwise
        inline bool                operator!=(
                                    const std::string &str            //!< String to compare against
                                    ) const { return (!(*this == str)); }

                                //! Comparison operator (compares hashes)
                                //! \return true if the hashes don't match, false otherwise
        inline bool                operator!=(
                                    const unsigned int hash            //!< String to compare against
                                    ) const { return (!(*this == hash)); }

                                //! Get the string the identifier represents
                                //! \return The string this identifier represents
        const std::string&        getString() const { return m_string; }

                                //! Get the hash for the string
                                //! \return The hash for the string

        inline unsigned int        getHash() const { return m_hash; }

    private:
        std::string                m_string;                            //!< The string this identifier represents
        unsigned int            m_hash;                                //!< A hash of the string in this identifier
};

#endif
Sorry about the formatting being a bit off, code tags never convert tabs quite correctly.

See, I told you it was simple (don't be intimated by all the comparison operators, they are all doing pretty much the same thing). All that's left to do is define the functions to set the identifier, and to compare the hashes.

Code:
// ------------------------------------------------------------------
/*!
    Required headers
*/
#include "CIdent.h"
#include "MurmurHash2.h"


// ------------------------------------------------------------------
/*!
    Constructor
*/
CIdent::CIdent() : m_string(""), m_hash(0)
{
}


// ------------------------------------------------------------------
/*!
    Constructor
*/
CIdent::CIdent(
    const std::string &str            //!< String to set from
    )
{
    setIdent(str);
}


// ------------------------------------------------------------------
/*!
    Constructor
*/
CIdent::CIdent(
    const char *const str            //!< String to set from
    )
{
    setIdent(std::string(str));
}


// ------------------------------------------------------------------
/*!
    Destructor
*/
CIdent::~CIdent()
{
}


// ------------------------------------------------------------------
/*!
    Set the identifier from a string
*/
void CIdent::setIdent(
    const std::string &str            //!< String to set from
    )
{
    m_string    = str;
    m_hash        = murmurHash2(str.c_str(), static_cast<unsigned int>(str.length()), 0);
}


// ------------------------------------------------------------------
/*!
    Comparison operator (compares hashes)
    \return true if the hashes match, false otherwise
*/
bool CIdent::operator==(
    const std::string &str            //!< String to compare against
    ) const
{
    return (m_hash == murmurHash2(str.c_str(), static_cast<unsigned int>(str.length()), 0));
}

And as you can see, aside from the constructors, there are only two functions, both of which are really small and simple. The setIdent function just sets the string to the string passed in, and then hashes the string with MurmurHash2 and stores the returned hash. The overloaded comparison operator just compares the hash of one string against the hash of the string passed in (the rest of the comparison operators are defined inline in the header file).

Conclusion
If you use that class where you would have used a string to identify something, then you will have a pre-computed hash available for comparison using an unsigned integer compare; which is much faster than a string compare even taking into consideration the overhead of hashing the string (which is why it is better to store the hash in the class than calculate it everytime*). Obviously, the more times you compare the hash without having to hash another string (say, iterating through 100 nodes looking for one by name), the greater the overall benefit.

*If you are generating the hash every time for everything then this method will be even slower than a string compare due to the hashing overhead. You must store an instance of the identifier somewhere (usually on the object it relates to) to get the maximum efficiency.
[Image: sig.php]
The little boat gently drifted across the pond exactly the way a bowling ball wouldn't.
Reply
#2
Amazing, really good work MrD.!
This will come in handy soon!...

But what's with the TAB size.... I needed a bit to understand what where was, but it's nothing big Tongue
Reply
#3
Heh Smile

The tab spacing is something I acquired from the coding standards at work. One column for the return types, one column for the members (with parameters each on their own line) and then a final column for doxygen comments.

It looks a bit weird when you first see it, but now I'm used to it I can read it just like English and use it all the time.
[Image: sig.php]
The little boat gently drifted across the pond exactly the way a bowling ball wouldn't.
Reply
#4
Ahhh, I get it... I does look bit weird but as you said, I'm now able to read it normal.....

Lines like those was a small pain... but yeah it's history ;)

Code:
inline bool                operator==(
                                    const unsigned int hash            //!< String to compare against
                                    ) const { return (m_hash == hash); }
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)