2.9 String

The String class is a string container providing methods to store and manipulate strings. LYRIC defines symbols (characters) to be stored in a string as unicode, which is a 32 bits unsigned integer. However the storage type is names unicode, the String class doesn’t have unicode features. Here unicode means it can store 32 bits unicode symbols.

The String class behaves slightly different in debug and optimized mode, depending if your program is linked with the debug or release version of the library. In debug mode all members build the ASCII representation if the string (whenever possible), which slows down performance considerably, but shows a readable content of the string. In the optimized version of the library the ASCII representation is built only when needed (when the char* operator is called), which increases performance.

You will probably experience complaints about ambiguities when using the String class, with the subscript operator. Especially when you use a constant number as subscript argument, like [7]. This ambiguity comes from the fact that the String subscript operator takes a Size type argument, which is an unsigned int, while a number is interpreted as a signed int (or simply int). The ambiguity comes from the casting operator char*. Indeed, a C++ compiler will know the following subscript operators after the String class definition:


    operator[](char*, int)
    operator[](String, unsigned int)

Now assume a String s. When we write:


    s[0] = 'a';

a compiler will try to match


    operator[](String, int)

with one of the two subscript operators it knows. Both need a cast, either from String to char*, or from int to unsigned int to match. This results in an ambiguity.

The easiest solution to remove the ambiguity is to cast the argument to be an unsigned int. This can be done by adding a u right after the number. The following line has no ambiguity:


    s[0u] = 'a';


Synopsis


  #include <lyric/String.hpp>
  
  class String : private List<char32>
  {
  public:
    typedef char32 Symbol;
    ~String ();
    String ();
    String (const String& string)
      throw (Exception::Memory::Alloc);
    String (const char* ascii)
      throw (Exception::Memory::Alloc);
    String (const wchar_t* wascii)
      throw (Exception::Memory::Alloc);
    String (char ch)
      throw (Exception::Memory::Alloc);
    operator char* () const
      throw (Exception::Memory::Alloc);
    Symbol& operator [] (Size index)
      throw (Exception::Memory::Range);
    const Symbol& operator [] (Size index) const
      throw (Exception::Memory::Range);
    String& operator = (const char* ascii)
      throw (Exception::Memory::Alloc);
    String& operator = (const String& string)
      throw (Exception::Memory::Alloc);
    String& operator << (const char* ascii)
      throw (Exception::Memory::Alloc);
    bool operator == (const String& string) const;
    bool operator == (const char* ascii) const;
    bool operator != (const String& string) const;
    bool operator != (const char* ascii) const;
    bool operator < (const String& string) const;
    bool operator > (const String& string) const;
    String& operator += (const String& string);
    String& operator += (Symbol ch);
    friend String operator + (const String& str1, const String& str2);
    friend ostream& operator << (ostream& os, const String& string);
    Size length () const;
    void clean ();
    void create (const char* ascii, Size leng);
    void append (const Symbol ch);
    void append (const String& string);
    void append (const char* ascii, Size leng);
    void insert (Size index, const Symbol& ch);
    void insert (Size index, const String& string);
    void remove (Size index);
    void remove (Size index, Size size);
    String sub (Size index, Size size) const;
    String sub (const SubId& subid) const;
    void capitalize ();
    void lowerize ();
    bool contains (const String& sub);
    List<Size> pos (const String& sub);
    List<String> tokens (const String& delimiter) const;
    List<String> split (const Regexp& rule) const
      throw (Exception::Memory::Alloc);
    void rmlsp ();
    void rmtsp ();
    void rmltsp ();
  };


Description


˜String ()
Destroys this string, releasing all used memory resources.

String ()
Constructs this string as an empty string.

String (const String& string)
Constructs this string from the given string. All properties and data stored in string are cloned into this string.
|\ Exception::Memory::Alloc
is thrown if not enough memory is found to store string into this.

String (const char* ascii)
Constructs this string from the given ascii string. Data stored in ascii is copied into this string.
|\ Exception::Memory::Alloc
is thrown if not enough memory is found to store the ascii string in this string.

String (const wchar_t* wascii)
Constructs this string from the given wascii string. Data stored in wascii is copied into this string.

String (char ch)
Constructs this string from the given character. The result of this constructor is this string with length 1 and containing the given character.
|\ Exception::Memory::Alloc
is thrown if not enough memory is found to store the character in this.

operator char* () const
Returns this string as a single byte ASCII string. This casting operator is used to output LYRIC strings in C functions taking char* arguments (like printf, fprintf, open, etc).
The returned pointer points into a memory area managed by this string. This area is valid as long as this string exists, and this string wasn’t modified - e.g: as long as only const defined member functions and/or operators are called, this operator being the exception to the rule: it can change the memory area, either in location or in content.
It is highly recommended not to rely on a long existence of the memory area pointed by the return pointer of this operator. Using it as argument to C functions is ok, but assigning it to a char* variable for later use is dangerous since this string can release the memory area without prior notice.
Never use the C memory manipulation functions (free, realloc) with argument the pointer returned by this operator. This will break this string’s internal functionality and may produce memory errors in the most unexpected places in your code.
|\ Exception::Memory::Alloc
is thrown if not enough memory is found to store this.length() bytes.

String::Symbol& operator [...] (Size index)
Returns a reference to symbol at index in this string. This operator can be used to modify the content of a string at the given index.
|\ Exception::Memory::Range
is thrown if the given index is out of this container’s size, if the debug version of LYRIC (-lyric-g) is linked.

const String::Symbol& operator [...] (Size index) const
Returns a const reference to symbol at index in this string, this being a const itself.
|\ Exception::Memory::Range
is thrown if the given index is out of this container’s size, if the debug version of LYRIC (-lyric-g) is linked.

String& operator = (const char* ascii)
Assigns the given ascii string to this string. Data stored in ascii is copied into this string.
|\ Exception::Memory::Alloc
is thrown if not enough memory is found to store the ascii string in this string.

String& operator = (const String& string)
Assigns the given string to this string. All properties and data stored in the given string are coped into this string. A reference to this string is returned for assignment operations chaining.
|\ Exception::Memory::Alloc
is thrown if not enough memory is found to store string into this.

bool operator == (const String& string) const
Compares this with string and returns true if both contain the same information, false if not.

bool operator != (const String& string) const
Compares this with string and returns false if both contain the same information, true if not.

void remove (Size index, Size size))
Removes a sub-part from this string. The sub-part is given by its starting position index into this, and its size.

String sub (Size index, Size size) const) const
Returns a sub-string of this string. The sub-string is given by its starting position index into this, and its size.
Note: Incomplete. No range checking right now.

String sub (const String::SubId& subid) const
Returns a sub-string of this string. The sub-string is given by the sub-string identifier subid.
Note: Incomplete. No range checking right now.

void capitalize ()
Forces the first alphabetical symbol in this string to uppercase. Leading non alphabetical symbols in this string are ignored during the parsing. Only the eventual first lowercase letter is changed to uppercase.

void lowerize ()
Changes all uppercase letters in this string to lowercase, as far as a lowercase of a letter is defined.

bool contains (const String& sub) const
Returns true if this string contains the given sub-string, false if not.

List<Size> pos (const String& sub) const
Returns the positions -- in the list of sizes -- of the given sub-string into this string.
Note: Don’t know how robust the sub searching is. It probably wont’ handle partly recovering sub-strings.

List<String> tokens (const String& delimiter) const
Tokenises this string into parts -- returned in a list of strings -- using a given delimiter. The delimiter can be a character or a string. The returned sub-strings are the text parts bounded by the given delimiter. If delimiter was not found in this string, the returned sub-string is this string. The delimiter itself is not returned in the sub-strings. This string is left unchanged.

List<String> split (const Regexp& rule) const
Splits this string into parts -- returned a list of strings strings -- according to a given regular expression splitting rule. The returned sub-strings are either a matched expression or unmatched text. This string is left unchanged.
A couple of examples (like the Path::expand implementation) would help to understand the splitting.

void rmlsp ()
Removes leading spaces/tabs in this string. Removes all spaces/tabs preceding the first non space/tab character in this string. Doesn’t touch spaces/tabs within text.

void rmtsp ()
Removes trailing spaces/tabs in this. Removes all spaces/tabs following the last non space/tab character in this string. Doesn’t touch spaces/tabs within text.

void rmltsp ()
Removes leading and trailing spaces/tabs in this string. Removes all spaces/tabs preceding the first non space/tab character and all spaces/tabs following the last non space/tab character in this string. Doesn’t touch spaces/tabs within text.


  2.9.1 String::SubId