Twitter  YouTube  E-Mail  RSS
The One Man MMO Project
The story of a lone developer's quest to build an online world :: MMO programming, design, and industry commentary
Short String Optimization
By Robert Basler on 2021-08-17 18:43:24
Homepage: onemanmmo.com email:one at onemanmmo dot com

I have my own String class that manages all strings within Miranda. It supports static buffers for string data (ie on the stack), buffers for string literals (stored in the data segment), as well as heap allocated buffers. Something I've been wanting to do for a while now is to add a new kind of buffer where the string data is stored within the String class itself. This saves an allocation when the string is short.

Originally I was going to add a small static buffer within the String class, but that seems like a deoptimization, so what I did instead was use a union to overlay ShortString right overtop of the pointer, two size members and the padding needed for LongString.

My String class is 16 bytes on Win32 and 32 bytes on Win64, so enough room for a good fraction of the strings the game uses. It uses UCS2 internally so every character is 2 bytes. (Yes, it's an obsolete format due to the expansion of the number of glyphs needed in recent years, but variable length encoded formats like UTF-8 are really inconvenient for string manipulation. At some point I'll change it to UCS32.)

The String class uses the mFlags member to know which type of string buffer to use. Normally the flags value would be outside the union. One trick I did to recover a few bytes of padding to use for storage for my short strings was to make each struct have an mFlags member first, rather than making mFlags an external variable. In practise that means that I can always use mShortString.mFlags and it will work even for LongStrings. I don't think there's any guarantee that the compiler will put the mFlags member at the same address in both structs in the union just because it is first in both, so there is a check in my code just to make sure.

I first tried using #pragma pack to recover that padding, but the game almost immediately crashed on a pointer that was only half-assigned, so maybe that part of the compiler's code generator isn't super-reliable. It turned out that once I moved mFlags into the structs I didn't need #pragma pack, as the compiler packed the members on 2 byte boundaries itself which was what I wanted.

struct LongString
{
  byte mFlags;
  sizet mStringLength;
  sizet mStringBufferSizeChars;
  wchar* mString;
};

static const unsigned int SHORT_STRING_SIZE = ( sizeof( LongString ) - 2 * sizeof( byte ) ) / sizeof( wchar );

struct ShortString
{
  byte mFlags;
  byte mStringLength;
  wchar mString[ SHORT_STRING_SIZE ];
};

union
{
  LongString mLongString;
  ShortString mShortString;
};


So with these changes, String can store 7 characters on Win32 and 15 on Win64 for free.

New Comment

Cookie Warning

We were unable to retrieve our cookie from your web browser. If pressing F5 once to reload this page does not get rid of this message, please read this to learn more.

You will not be able to post until you resolve this problem.

Comment (You can use HTML, but please double-check web link URLs and HTML tags!)
Your Name
Homepage (optional, don't include http://)
Email (optional, but automatically spam protected so please do)
Type girl. (What's this?)

  Admin Log In



[Home] [Blog] [Video] [Shop] [Press Kit] [About]
Terms Of Use & Privacy Policy