µSer
|
This page explains all concepts of using µSer.
µSer's main task is to deal with different kind of data structures designed by the library's user. The object passed to uSer::serialize serialize or uSer::deserialize deserialize is the root of a tree-like data structure. The leaves are always integral or enum types, and the inner nodes various kinds of containers - similar to e.g. JSON, but the structure is fixed by the application's source code.
All standard integral types are supported by µSer: bool, char, char16_t, char32_t, wchar_t, short, int, long, long long and signed/unsigned variants. All extension types for which std::numeric_limits, std::is_signed_v and std::is_unsigned_v are specialized and for which std::is_integral_v<T> is true work as well. Enumeration types are casted to/from their underlying integral type for (de)serialization.
Integers may be stored inside container types, which in turn can be stored in more containers. µSer supports different kinds of containers: C-Arrays, std::array and all other homogeneous containers supporting iterators as well as structs that have non-static member variables, std::tuple and std::pair, which can be considered heterogeneous containers. µSer never resizes containers - before deserialization, the user has to allocate elements that µSer will write to. Structs have to be annotated to make their members known to µSer; all other containers can be directly used. The contained elements are (de)serialized consecutively.
We have already seen how to serialize an integer:
Serializing a container essentially works the same way:
This example outputs "TEST" too, but this time we packed two bytes into one 16-Bit-Integer.
Tuples allow us to combine values of different types:
The output still stays the same.
Until now, µSer was able to check at compile time whether the data fits into the raw array, since the sizes of both are fixed. This is not possible when a container of dynamic size, such as std::vector, is used. In that case the container might be too large to fit into the raw data array. µSer then automatically implements a size check to prevent buffer overflow. To react to an error condition appropriately, we need to implement some error handling. The easiest way to do that is to defined the macro USER_EXCEPTIONS before including the uSer.hh file to enable exception support, and catch the exceptions:
Adding another element to the vector will provoke the error. See Error Handling for more information. As serializing structs needs some extra work, it is documented in Defining and Annotating structs.
Before we dive further into the API documentation, we need to think about how data is mapped from the raw binary stream to the C++ data structures. This is merely conceptual; µSer does calculations on whole integers and not individual bits.
As stated previously, µSer (de)serializes data to/from a stream of unsigned integers of equal type. We will call these integers "serialization word" or short "SWord". Typically, std::uint8_t or unsigned char will be used here. Depending on the architecture, using a larger type may improve performance, particularly if the size of a SWord corresponds to the register size of the processor. For example, on a 32-Bit-Architecture std::uint32_t may yield best results. However, since the size of the elements of dynamic data structures must be a multiple of the SWord size, the size of the SWord may not be chosen arbitrarily depending on the data structures.
The raw binary in/out data is seen as a stream of bits which are made up of the sequence of SWords. The least significant bit of the SWords always comes first, and the most significant one comes last. If you serialize some data into a std::uint8_t stream, then combine pairs of those into std::uint16_t integers in a little endian fashion and deserialize the resulting stream, you should end up with the same data. If some communication channel combines 8-Bit-Integers into 16-Bit-Integers in a big endian fashion, the above rule would be violated - bits 8-16 of the first integer would come first, and then bits 0-7. To deserialize such a data stream, you have to swap the bytes manually or by calling uSer::deserialize a second time (see below for explanation of the API usage):
The bits in one serialization word can be configured by the uSer::RawInfo attribute. This can be used to e.g. serialize data into 7-bit-Words or other word sizes for which the platform offers no integer type. The serialization word type must be large enough to store those bits.
Of the tree formed by the C++ data structure, only the leaves, i.e. the integers (or enums converted to integers) are mapped to the bit stream. No meta information about the types or container sizes is written or read such that the application can implement most given formats. In essence, all integers and enums in the data structure are mapped to a "flat" sequence of integers of different size whose bits are then mapped to the raw data stream. An integer may be split into bytes whose order can be configured as the byte order (usually little or big endian). Of each byte the least significant bit is serialized first, and the most significant one last. This means that the bytes are always stored "forwards" in the raw data stream and never reversed. If individual integers or all SWords need to be reversed, this has to be done manually.
µSer's behaviour can be controlled by attributes. These are a set of types, some of them templates, that are passed via type parameters to µSer. There are never instances made of those types, and they provide no member functions for application code. Attributes are valid for one object. If they are used on a container type, they are applied to the contained elements. An exception are structs: Attributes applied to a struct are only used on the members if they are marked as inheritable. The easiest way to specify attributes is by passing them as template parameters to the serialize and serialize functions:
The uSer::ByteOrder::BE attribute configures the byte order to big endian, which reverses the string written to standard output. The uSer::Width attribute sets the size of the integer explicitly, see Integer Sizes. Entire structs and their members can be annotated with attributes as well, see below for details.
The following attributes are available:
Category | Name | Default | Inheritable | Description |
---|---|---|---|---|
Byte Order | uSer::ByteOrder::LE | ✔ | ✔ | Serializes integers in little endian byte order, i.e. the least significant byte first. |
uSer::ByteOrder::BE | ✔ | Serializes integers in big endian byte order, i.e. the most significant byte first. | ||
uSer::ByteOrder::PDP | ✔ | Serializes integers in PDP-endian byte order, i.e. serialize 32bit-Integers as two 16bit-integers, the most significant one first, and each internally as little endian. | ||
Sign Format | uSer::SignFormat::TwosComplement | ✔ | ✔ | Stores signed integers in 2's complement, the standard for most architectures. The top half of the unsigned integer range is mapped on the negative numbers until -1, keeping the order. |
uSer::SignFormat::SignedMagnitude | ✔ | Stores signed integers in Signed-Magnitude format, i.e. an absolute value and a sign bit that defines whether the value is positive or negative. | ||
uSer::SignFormat::OnesComplement | ✔ | Stores signed integers in 1's complement, similar to Signed-Magnitude but the absolute value is bitwise negated for negative values. | ||
Padding | uSer::Padding::None | ✔ | ✘ | No additional dummy bits after integers. |
uSer::Padding::Fixed | ✘ | Stores a given fixed amount of bits after an integer, e.g. to accommodate alignment requirements. | ||
Dynamic Data | uSer::Dyn::Size | ✘ | Define the size of a container depending on runtime information | |
uSer::Dyn::Optional | ✘ | Define optional data objects depending on runtime information | ||
Hooks | uSer::Hook::SerPre | ✘ | Invoke a user-provided callback function before serializing an object | |
uSer::Hook::SerPost | ✘ | Invoke a user-provided callback function after serializing an object | ||
uSer::Hook::DeSerPre | ✘ | Invoke a user-provided callback function before deserializing an object | ||
uSer::Hook::DeSerPost | ✘ | Invoke a user-provided callback function after deserializing an object | ||
Integer Width | uSer::Width | ✔ | Manually define the size of an integer in bits | |
SWord information | uSer::RawInfo | ✘ | Explicitly define the serialization word and optionally its size; useful if an iterator is used for which std::iterator_traits<>::value_type is not defined, e.g. std::back_insert_iterator. Only valid in the argument list of serialize and deserialize. |
The attributes belonging to one category are mutually exclusive; at maximum one of them may be defined on an object. If a sub-object has an attribute defined that conflicts with an inheritable attribute of a surrounding object, the attribute of the sub-object takes precedence. For example, if a struct is annotated with uSer::ByteOrder::LE but a member is annotated with uSer::ByteOrder::BE, the latter is effective for that member and its sub-objects, if any. The complete reference to attributes is found here.
The main entry points of µSer's API are the functions serialize and deserialize which have several overloads to accommodate different use cases. The two functions have almost identical signatures that only differ in const-ness of the first two parameters. Because simply listing all overloads wouldn't be much help, we'll look at the functions on a more abstract level:
The meaning of the parameters is:
The "size" parameter specifies the size of the raw data stream, i.e. how many serialization words are available. It may be omitted if "raw" refers to a C-Array (an actual array, not a pointer to the first element), a std::array, or any container providing a ".size()" member function, in which case its size is used. The size may be specified in different ways:
When using uSer::FixedSize for the size parameter or when using an array type for "raw", µSer can check the buffer size at compile time. If it is to small (excluding dynamic data structures), µSer will emit an error via static_assert.
We have already seen how to serialize to arrays:
Serializing into a vector works similarly:
Note that the vector was initialized with 4 elements which are overwritten by serialize. If a number smaller than 4 were passed to std::vector's constructor, an exception would be thrown.
When "uSer::fixedSize<N>" is passed as the "size" parameter and the data structure contains no dynamic data, no runtime buffer range checks are performed, and no error handling is necessary:
We can pass a std::size_t or a uSer::DynSize as the size parameter to only (de)serialize a part of a container with a runtime-known size:
Since the size is not known in advance, error handling is required. When there is no explicit limit to the raw buffer size, we can pass "uSer::infSize" as the size argument. This can e.g. be used in combination with std::back_inserter to automatically grow the container as needed. Since µSer doesn't check the buffer size in this case, error handling is not needed unless other things require error handling (dynamic data structures). Unfortunately, the C++ Standard Library offers no way to determine the serialization word type from a given std::back_inserter_iterator type. Therefore, we have to explicitly pass the serialization word to the seralize function via the uSer::RawInfo attribute:
This is also the first example where an iterator instead of a reference to a container is passed to deserialize.
The return type of serialize and deserialize depends on the parameters and is used to signal error conditions. If USER_EXCEPTIONS is defined before including the uSer.hh header, exceptions are used to signal errors, and the return type will always be void. If it was not defined, and an error is possible (e.g. when using dynamic data structures or dynamic buffer sizes), the return type will be uSer_ErrorCode to signal success (uSer_EOK) or failure. If no errors can happen, the return type will be void. The uSer_ErrorCode enum is defined as "[[nodiscard]]" - if you ignore the returned value, the compiler will emit a warning. If you use the return value even if no errors are possible, you will get a compiler error. This prevents both forgetting to include error handling and superfluous error handling code.
We have already seen how to use exceptions:
When using return codes, you can user uSer_getErrorMessage to retrieve a string describing the error:
If you were to ignore the returned value in this example, the compiler will issue a warning.
In C++, struct and class are actually almost the same thing: Both keywords declare classes (technically, there are no structs in C++) with the only difference that members and base classes of classes declared by "struct" are public by default, and private for those declared by "class". However, it is customary to use "struct" for simple classes that only contain "flat" data without encapsulation mechanisms such as getter/setter functions, similar to how they are used in C. Therefore, we will call these classes "structs". Typically, most serializable data in applications is declared in such structs, e.g. to contain data for network packets or file headers. Therefore, structs play a central role in µSer.
Unlike with containers, tuples and arrays, there is no way to automatically get a list of the members of a struct. This makes it necessary to explicitly make the list of members known to µSer.
The easiest way to do this is as follows:
First, the struct is defined as usual. Then a call to the USER_STRUCT macro is placed at the beginning. It requires one parameter, which is the name of the struct. There may be further parameters that define attributes valid for the whole struct. In this example, we don't provide any attributes. A strict compiler might emit a warning or error message because empty variadic macro parameters are not permitted by the C++ standard; in that case, we can pass uSer::AttrNone as a dummy meaning "no attributes".
Declaring attributes for a whole struct looks this way:
Now, all members of the struct are serialized as big endian.
A single member can be annotated by using USER_MEM_ANNOT :
The definition and annotation of individual members can be combined by using USER_MEM :
This method does not support arrays since their syntax requires parts of the type to be placed after the member name.
The most compact method to declare structs is provided by USER_DEF_MEM :
This variant defines annotations for all members in one step. Again, we may have to use uSer::AttrNone to prevent empty variadic argument list. This variant does not support arrays either, but avoids typing every variable name twice.
The previously presented methods require the modification of the actual struct definition. They all add some some member types and static member functions to the struct whose names start with "uSer_". If the struct definition cannot be changed (e.g. because it belongs to an external library) or adding those members is not desirable (even though they have no influence on the struct's runtime behaviour), structs can be defined in a non-intrusive fashion. The macros used to achieve this have to be used in the global namespace. If the struct is defined in some namespace, its name has to be fully qualified.
Attributes valid for the whole struct can optionally be defined by USER_EXT_ANNOT . If no attributes are needed, this macro can be omitted. Individual members can be annotated by USER_EXT_MEM_ANNOT . The only required macro is USER_EXT_ENUM_MEM, which defines the list of members.
This can also be done in a more compact way by using USER_EXT_DEF_MEM which defines and annotates all members at once. Attributes for the whole macro can again optionally be defined by USER_EXT_ANNOT .
It is actually possible to annotate any serializable type with USER_EXT_ANNOT . Annotating general types such as "int" or "std::vector<short>" is however discouraged, since the annotation will be valid for any serialization process and might affect the behaviour of other parts of the program.
An important aspect for serializing integers is the byte order. Integers that have more bits than a byte need to be split into multiple bytes for storage. Most platforms enforce a particular byte order by storing integers in memory in a certain way. If the data is copied from memory and written to a file directly, that byte order is applied to the stored data as well. If that file is copied to a computer with a different byte order, loaded into memory directly, and processed by arithmetic operations, unexpected results may occur. The most popular byte order is little endian, where the least significant byte is stored first and the most significant one last. This byte order is e.g. used by x86 and most ARM platforms. The reverse case is big endian, where the most significant byte is stored first. This is used by PowerPC and in various internet protocols. PDP endian is a hybrid variant, where 32-bit-Integers are split into two 16-bit-Integers, where the most significant one is stored first. The two 16-Bit-Integers are internally stored as little endian. This order stems from the PDP architecture.
For example, the number 305419896, or hexadecimal 0x12345678, is stored in the following orders (all numbers hexadecimal):
Byte Order | Address 0 | 1 | 2 | 3 |
---|---|---|---|---|
Little Endian | 78 | 56 | 34 | 12 |
Big Endian | 12 | 34 | 56 | 78 |
PDP Endian | 34 | 12 | 78 | 56 |
µSer offers functionality to convert the data from the local platform's byte order into a specific defined order and back. The application code defines a fixed order desired for raw binary data, i.e. the network protocol or file format. µSer automatically converts the C++ data in the local order to/from that order. Since this is done via bit-operations, the order of the local platform need not be explicitly known and could actually be an entirely different one.
We have already seen how to use attributes to request a certain byte order:
If no byte order is defined, µSer assumes little endian. If we serialize this struct on one platform and then deserialize it on another, µSer guarantees that the data in the integers is correct, regardless of the byte orders of the two platforms.
When the Width attribute is used to define an integer size that is not a multiple of a byte, the "incomplete" byte is assumed to contain the remaining most significant bits. In big endian order, this incomplete byte then comes first. Consider this example:
The output is:
Like with byte orders, different platforms have different ways to store negative values. µSer supports three of those: Two's complement, One's complement and Signed-Magnitude.
The format two's complement takes the top half of the corresponding unsigned integer's range, and translates it into the negative numbers. For example, for a 16-Bit-Integer, the numbers 0-32767 stay as they are. The binary representation of the unsigned number 32768 means -32768 in two's complement, 32769 means -32767, and 65535 means -1. This is the most popular format used on most platforms, and the other two are rare.
The signed-magnitude format stores an absolute value and a sign bit that defines whether the value is positive or negative. This format corresponds to the usual decimal notation, where the a sign bit of "1" means "-" and "0" means "+". Inverting the sign bit means negating the value. This format has two representations for zero, i.e. +0 and -0.
The one's complement format is similar to signed-magnitude but the absolute value is bitwise negated for negative values. This format has two representations for zero as well. Inverting all bits means negating the number.
Just like the byte order, the application defines the desired sign format, and µSer converts the local format from/to the desired one. This works independently of the host's format. The ranges of the different formats are not equal: For example, 16-bit two's complement numbers have the range -32768 - 32767. 16-bit-Integers in the other two formats have the range -32767 - 32767. µSer uses static assertions to make sure the integer in the raw data fits into the local format. For example, if the local platform uses signed-magnitude, and you try to deserialize a 16-bit-integer in two's complement into a int16_t, you will get a compiler error, since -32768 can not be represented on the host platform.
This example serializes a signed integer in all three sign formats:
The output is:
The different platforms support different integer sizes, but a protocol might require integers of a size for which no type exists. µSer allows to set a specific fixed size for an integer by using the Width attribute. The next integer will be serialized right after the previous one; integers need not start at a byte order. Essentially, this simulates bitfields in the binary data stream, without the need for actual C++ bitfields which behave in a non-portable way.
A simple example is to convert RGB565 color values into a single 16bit-Integer:
Some formats require unused bits between the individual values. These are ignored during reading, and typically written to zero. These unused bits are called padding bits. While it is possible to define "dummy" integers (possibly using uSer::Width to achieve a specific amount of bits) that receive the padding bits, this wastes memory. Therefore, µSer provides the Padding::Fixed attribute which can only be applied to integers, and specifies a fixed amount of padding bits that come after that integer.
Assuming we want to skip the "green" component from the previous color example, we can add 6 padding bits after the "red" one:
The remaining two values retain their position in the raw data, and the "gap" is filled with zero bits.
µSer accesses the raw data stream via iterators. In C++, an iterator is a small class that "refers" to an element in a container or an input/output stream. Unlike a simple index, an iterator instance knows which container it refers to. A pointer is the simplest kind of iterator, to be used with C-Arrays. The first parameter to the serialize and deserialize functions is an iterator to the first raw data element (or a container/C-Array, of which the iterator to the first element will be queried using std::begin). µSer supports all standard library iterators that refer to unsigned integers, but it is possible to implement your own iterators to e.g. directly read/write the raw serialization words from/to some communication interface, network or file.
We can use std::istream_iterator and std::ostream_iterator to read/write raw data directly from/to files:
Note that RawInfo is needed for the output operator, since µSer can't automatically determine the serialization word type from std::ostream_iterator. The iterators are instantiated with "std::uint8_t" since µSer needs unsigned integers in the raw stream. The file size is queried ahead of deserialization, as the input iterator doesn't signal end-of-file to µSer.
µSer's requirements for iterators are actually weaker than the standard library's, making it easy to write your own ones. With "T" being the iterator type, "iter" an instance of "T", and "x" an instance of the serialization word, the basic requirements imposed by µSer are:
If the raw binary stream contains bytes that consist entirely of padding bits, µSer is able to skip over those efficiently. There are seven cases for different kinds of iterators depending on whether padding bytes exist, each with their own set of guarantees and requirements:
In this case, "T::iterator_category" is not used. µSer will call "*iter" and "++iter" in strictly alternating order, e.g.:
In this case, "T::iterator_category" is not used. µSer will call "(*iter)=x" and "++iter" in strictly alternating order, e.g.:
This makes it easy to implement both input/output iterators, since the actual read/write can be done via the "operator *" while not doing anything in "operator ++". An example for both is:
In this case, "T::iterator_category" is not used. Two calls to "*iter" will never occur in direct succession, but multiple calls to "++iter" might directly follow each other to skip over padding bytes:
In this case, "T::iterator_category" is not used. With "n" being an instance of std::iterator_traits<T>::difference_type, "iter += n" should skip over n bytes. Two calls to "*iter" will never occur in direct succession, "++iter" will always follow "*iter", "iter += n" can optionally follow a "++iter", and "*iter" will follow either "++iter" or "iter += n", e.g.:
In this case, µSer will call "(*iter)=x" and "++iter" in strictly alternating order and padding bytes will be written as zero, e.g.:
This section applies for iterators where std::iterator_traits<T>::iterator_category is exactly or convertible to std::forward_iterator_tag. In this case, two calls to "*iter" will never occur in direct succession, but multiple calls to "++iter" might directly follow each other to skip over padding bytes:
This section applies for iterators where std::iterator_traits<T>::iterator_category is exactly or convertible to std::forward_iterator_tag. With "n" being an instance of std::iterator_traits<T>::difference_type, "iter += n" should skip over n bytes. Two calls to "(*iter)=x" will never occur in direct succession, "++iter" will always follow "(*iter)=x", "iter += n" can optionally follow a "++iter", and "(*iter)=x" will follow either "++iter" or "iter += n", e.g.:
Data structures are considered dynamic if their size is not known at compile time, but is determined at runtime based on data available only then. For deserialization, the size of a dynamic data structure might depend on other data objects that have just been deserialized. The serialized size of any dynamic data structure must be equal to or a multiple of the size of a serialization word. This restriction ensures that data that comes after the dynamic data always starts at the same bit in one serialization word, which greatly improves performance. For example, if you want to serialize a std::vector<std::uint16_t>, the serialization word can only be std::uint8_t or std::uint16_t, but not std::uint32_t. Since dynamic data structures always have the potential to overflow the raw buffer, error handling is required unless InfSize is specified as the buffer size.
µSer supports three different kinds of dynamic data:
When serializing a container that is not std::array or a C-Array, µSer will assume it to be dynamic, and query its size by using its ".size()" member function. µSer will serialize or deserialize exactly as many elements. This means that the prior to deserialization, the container needs to be set to its desired size, as µSer will never resize containers.
We have already seen how to serialize std::vector:
Deserialization then works like this:
Note how the vector is initialized to have 2 elements. Passing a greater integer will cause an exception as the buffer would overflow.
By annotating a container member of a struct with the Dyn::Size attribute, you can make its size depend on various kinds of runtime data. The sole argument to Dyn::Size is a reference which tells µSer what size the container has. The reference can be:
This example shows four variants:
By annotating a container member of a struct with the Dyn::Optional attribute, you can make its presence depend on various kinds of runtime data. The sole argument to Dyn::Optional is a reference which tells µSer whether the object exists. For the reference, the same rules as for Size apply, but the return type has to be (convertible) to bool.
This example shows four variants:
The variables N1-N4 are declared as std::uint8_t to avoid odd data in the example.
Sometimes it is necessary to do some application-specific calculations and checks during (de)serialization. µSer accommodates this by allowing the application to specify hook functions which will be called before or after an object is (de)serialized. This can be achieved by the four attributes Hook::SerPre, Hook::SerPost, Hook::DeSerPre, Hook::DeSerPost. These attributes take a reference to a function which will be called before/after the annotated object is (de)serialized. The argument may reference:
The return type can be:
A constant (for serialization) or non-constant (for deserialization) reference to the annotated object is passed as the first argument. For cases 2-4, a constant (for serialization) or non-constant (for deserialization) reference to the struct is passed as a second argument, if the attribute was applied to a struct member.
An example for the different reference types and hook types is:
Note that during the pre-deserialization hook, the object might not contain any meaningful data.
µSer allows you to determine the size of the raw buffer based on the C++ data structure during compilation. This can be useful to allocate buffers of appropriate size.
µSer is designed to work on resource-constrained systems, such as small embedded systems. To this end, µSer never does any dynamic memory allocation. There are also some macros that can be defined before including the uSer.hh header (or via the compiler's command line) to configure µSer:
When using GCC, compile with "-fdata-sections -ffunction-sections -flto", and link with "-Wl,--gc-sections -flto". These options can reduce the amount of program memory needed. µSer also relies on optimization to be turned on (e.g. -O2 for GCC and Clang) to generate efficient code. µSer employs deeply nested call stacks to statically build algorithms specifically adapted to the user-defined data structures. With optimization enabled, the compiler collapses those into short and efficient algorithms. If some code does not work with optimizations enabled, this is most probably the result of relying on some undefined behaviour, i.e. programming errors. Using µSer instead of e.g. pointer casts to serialize data already avoids some of those problems (specifically padding, data alignment, aliasing rules).
µSer can be used in C-based projects if a C++17-compatible compiler is available. Fist, define the desired data structures in a common header for both C and C++ (here: packet.h):
Define the structs by using the µSer-provided macros as explained before. Also declare serialize/deserialize functions as needed, i.e. for the structs we want to explicitly serialize from C code. In this example, we only want to (de)serialize PacketB from C, and have the contained PacketA instances (de)serialized automatically. Therefore, we don't need serialization functions for PacketA. Adjust the signature as needed (with or without error codes in the return type, the desired raw buffer type, with or without a buffer size parameter). In order to call a C++ function from C, it has to be annotated with 'extern "C"', but only when the C++ compiler sees it. The USER_EXTERN_C macro can be used for this, which evaluates to 'extern "C"' when compiling as C++, and to nothing when compiling as C.
Then, create a C++ source file for implementing the serialization functions:
We have to include the "packet.h" file, and just call serialize/deserialize in the function body. Remember to switch on the compiler's C++17 support (e.g. -std=c++17 for GCC and Clang). We can then use these functions from C code, e.g.: