binary-tools.js

The fastest, most complete binary parser for JavaScript.

Buy now: $80.00 | LGPL copy PayPal: [email protected] copy PIX: [email protected] You will receive a download link as soon as I confirm the payment.

↪ Free download CC-BY-NC-ND-4.0 ↪ Run the browser tests

Dealing with
bytes should be a pleasure.

binary-tools.js eases the task of dealing with raw binary data by using a simple interface built upon conventions you are likely to be already familiar with and a extensive list of real-world types. Read the complete documentation for binary-tools.js below.

The fastest binary parser for JavaScript.

See all availale releases

Download the latest release:

binary-tools-1.12.0.tgz CC-BY-NC-ND-4.0 SHA-1: e78d9ab8c9377047e8dac9d00662048bb04c0550 MD5: be30dcf95c6acf8fb43b9f159ba8f44f

binary-tools is a JavaScript binary parser for any browser or environment.

  • Faster than TypedArrays and DataViews for large datasets
  • Available both as a ES6 module and ES3/CommonJS
  • Compatible with IE6+ and any environment with JavaScript support *
  • Native support to 64, 128, 256, 512 and 1024-bit integers! *
  • Zero dependencies!
  • No polyfills!
  • Native support to bfloat16 brain floating point format
  • Native support to 16-bit half-precision floating point format
  • Native support to 24-bit, 40-bit and 48-bit integers
  • Support user-defined integer types (like Uint12)
  • Support ASCII and ISO 8859-1 strings and characters
  • Support UTF-8, UTF-16 and UTF-32 strings
  • Tailored for real-life use cases
  • Less than 25kb minified!

*64-bit, 128-bit, 256-bit, 512-bit and 1024-bit integers packing/unpacking is only available on environments with native support to BigInt (ECMAScript 2020). The same goes for any user-defined integer type that is greater than 53-bit. You don't need to worry about this unless you are planning to pack/unpack integers greater than 53-bit.

Tests

You can run the tests in your browser:
https://rochars.com/binary-tools/test/dist/browser

The tests above use mocha, wich require some polyfills that may mask this module's working in older browsers. The link below runs binary-tools.js without loadig any file other than binary-tools.js itself (only a few functions are executed, test suite is not used):
https://rochars.com/binary-tools/test/dist/es3

This library is tested using datasets with thousands of packed/unpacked values generated by Python's struct module. 8-bit and 16-bit integers are fully tested with all possible values, signed and unsigned. Other types are tested with up to tens of thousands test values per type plus special cases (Infinity, -Infinity, NaN, overflows and so on).

Unicode strings are tested using datasets with all the code points in the Unicode code space, along with other tests.

I'm constantly adding new tests (redundant or not) as to ensure the correctness of this module. The browser tests available on https://rochars.com/binary-tools do not cover all the tests executed during the build process, as some of the dataset files used for testing are way too large. The paid versions of binary-tools include all the tests, datasets and also the full source code (except for the CC-BY-ND version).

Install

Download it from https://rochars.com/binary-tools.

You may also use a package manager:

npm install https://rochars.com/[email protected]

Or load it as a module from a CDN:


import * as btools from "https://rochars.com/[email protected]/index.js";

My CDN is fast, safe and reliable. It is powered by the kind folks at Cloudflare. Use it.

Using in Node.js

The examples below assume you already installed binary-tools as a dependency in your project.


const btools = require('binary-tools');

// Pack a signed 8-bit integer, returns a
// array with the number represented as bytes
let packed = btools.pack('h', -32765);
// packed == [3, 128]

let unpacked = btools.unpack('h', [255, 127]);
// unpacked = [32767]

// Pack a signed 16-bit integer to a existing byte buffer
// Start writing on index '4' of the buffer
let buffer = new Uint8Array(12);
btools.packTo('h', [1077], buffer, 4);
Or use ES6 imports if your software is a module:

import * as btools from 'binary-tools';

let packed = btools.pack('h', -32765);

Using in the Browser

Use the binary-tools.js file in the /dist folder:


<script src="./binary-tools/dist/binary-tools.js"></script>
<script>
  // Pack a 32-bit floating-point number
  var packed = btools.pack('f', 2.1474836);
</script>

If you are targeting only modern browsers you may use the ./index.js file in the root folder instead of ./dist/binary-tools.js. The ./index.js file have ES6-style exports.

Browser compatibility

The file ./dist/binary-tools.js is transpiled to ES3 and is compatible with IE6+. It should work in all modern browsers as well.

No polyfills are used or needed unless you need to work with 64-bit integers or any type of integer greater than 53-bit. Packing and unpacking 64-bit numbers is only available with engines that support ECMAScript 2020.

Cross-browser tests powered by Browserstack:
Browserstack logo

Using in Deno / ES6 module

Use the ./index.js file in the root folder:


import * as btools from "./binary-tools/index.js";

let packed = btools.pack('h', -32765);
// packed == [3, 128]

Or load it from a CDN:


import * as btools from "https://rochars.com/[email protected]/index.js";

let packed = btools.pack('h', -32765);
// packed == [3, 128]

Quick Reference

Difference of pack/unpack and write/read functions

The packing functions pack() and packTo() pack single values or structs, either returning an array with the values as bytes (in the case of pack()) or writing the packed values directly to a byte buffer (the case of packTo()).

Unpacking with unpack() or unpackTo() reads a single value or a struct from a buffer and return the values on an array (even if it is a single value) or write the results to a passed Array or TypedArray (in the case of unpackTo).

packing and unpacking functions can work with format strings made of many types (like 'hlhlhddd' or 'hlhlhtx4h').


The writing functions write one or many elements of a single type to a byte buffer (in the case of writeTo()) or return an array with the elements represented as bytes (in the case of write()).

Reading with read() or readTo() read one or many values of a single type from a byte buffer and return the values in an array (in the case of read()), or write the values directly to a TypedArray or Array (in the case of writeTo()).

reading and writing functions only work with a single type at a time (like 'h' or '>f'). Any other type present in the format string after the first one will be ignored ('>Hhh' will be handled as '>H', for example). Byte order operators work the same they do in the packing/unpacking functions.

Reading and writing functions do not use repeat count numbers. Repeat count numbers are just for packing and unpacking functions. read() and readTo() use optional start and end params to specify a slice of the input buffer for reading. If no start and end are specified, then the reading will cover the entire buffer. If just the start param is specified, then it will read from that index until the end of the buffer.

write() and writeTo() always write the entire input array of values, and writeTo() have a optional index param that specify a index in the output buffer to start writing.


In summary

pack() and unpack() are for multiple types at the same time, write() and read() are for many values of a single type.

If you are working with long sequences of values of the same type, read/write functions are much faster, and probably what you should use.

If you are working with file headers or other types of pre-defined structures of many different types, or a small number of values, then pack/unpack functions are more practical, and probably what you should use.

pack() and unpack() may use repeat count numbers to determine the number of values; read() and write() don't, as they always read and write the entire buffer or array (or a slice of it).


Types, operators and format strings

Format strings specify the types being packed/unpacked and how to pack or unpack them. Format strings for packing or unpacking structures may look like this:

- 'HhHdH'

where each character represent a type of data.

The operators

A format string may be preceded by operators. A format string with a operator may look like this:

- '>HhHdH'

where the > is a byte order operator that indicates big endian. Operators may only appear at the start of the format string.

The size and alignment operators:

The size and alignment operators are used to both align the data and set the proper size of some types.

The operators are:

  • '' - no operator; default sizes will be used and no alignment will be made
  • '~' - the same as no operator
  • '@' - non-standard types will have padding as to align to the closest C type that fits its size. If a value is provided for the @ operator then data will be aligned according to that value.
  • '#' - Similar to @, but uses a different set of rules.

The @ and # operators may be used in combination with a value to better specify the architecture. The value must appear immediately before the operator in the format string and must be a multiple of 8.


// No value for @; will only enforce all data is aligned
// as standard C types
btools.pack('@Ibhb', [1, 2, 3, 4]);

// Set 32-bit; enforce all data is aligned as standard
// C types and align the data to fit a 32-bit architecture
btools.pack('32@Ibhb', [1, 2, 3, 4]);

// Set 64-bit; enforce all data is aligned as standard
// C types and align the data to fit a 64-bit architecture
btools.pack('64@Ibhb', [1, 2, 3, 4]);

If the @ or # operators are used with a value then data will be aligned according to the defined architecture. The size of some standard types may change depending of the value used for the operator. As of version 1.12.x, only types 'l' and 'L' have sizes that may vary according to the architecture:


// 32-bit architecture; type L will use 4 bytes
btools.pack('32@Ld', [1, 2]);

// 64-bit architecture; type L will use 8 bytes
// Note that we need to change the JavaScript type
// from a regular Number to a BigInt:
btools.pack('64@Ld', [1n, 2]);

// With the # operator types l and L always uses 4 bytes:
btools.pack('32#Ld', [1, 2]);
btools.pack('64#ld', [1, 2]);

Structs will always be padded in the end as needed to fit the value of alignment operator.


// Defining a 32-bit architecture:
btools.pack('32@Ibhb', [1, 2, 3, 4]);
// will output [1, 0, 0, 0,   2, 0, 3, 0,   4, 0, 0, 0]
// In Python this would be the same as doing
// struct.pack("@Ibhb0i", 1, 2, 3, 4) on a 32-bit system.
// Note the 0i in the end of the Python's format string to
// enforce padding until the full size is reached

// The same data, now with a 64-bit architecture:
btools.pack('64@Ibhb', [1, 2, 3, 4]);
// will output [1, 0, 0, 0, 2, 0, 3, 0,   4, 0, 0, 0, 0, 0, 0, 0]

The % operator the can used at the end of the format string to prevent padding at the end:


// Using % to prevent padding at the end:
btools.pack('64@bhb%', [1, 1, 1]);
// will output [1, 0, 1, 0, 1]

// With no %
btools.pack('64@bhb', [1, 1, 1]);
// will output [1, 0, 1, 0, 1, 0, 0, 0]

The & is used to force padding at the end, wich is already the default behavior. Usint the & is the same as using nothing:


// Using & to make padding explicit:
btools.pack('64@bhb&', [1, 1, 1]);
// will output [1, 0, 1, 0, 1, 0, 0, 0]

The presence of the @ or # operators on a format string will force non-standard types to be aligned according to the closest standard C type that fits its size, but their size won't change.

Since alignment is based on the behavior of C compilers, alignement operators are normally used with types that are either compatible with C types or at least have a C type that is a close relative:


// 32-bit architecture:
btools.pack('32@Ibhb', [1, 2, 3, 4]);
// will output [1, 0, 0, 0,   2, 0, 3, 0,   4, 0, 0, 0]

// 64-bit architecture:
btools.pack('64@Ibhb', [1, 2, 3, 4]);
// will output [1, 0, 0, 0, 2, 0, 3, 0,   4, 0, 0, 0, 0, 0, 0, 0]

// Set width to 16-bit will enforce even values between members:
btools.pack('16@Ibhb', [1, 2, 3, 4]);
// will output [1, 0, 0, 0,   2, 0,   3, 0,   4, 0]

// Without operator, types are packed with no padding at all and
// no sizes are changed:
btools.pack('Ibhb', [1, 2, 3, 4]);
btools.pack('~Ibhb', [1, 2, 3, 4]); // the ~ operator is the same as no operator
// will both output [1, 0, 0, 0,   2,   3, 0,   4]

The alignment operators may be used with the non-standard types defined by binary-tools, too. They cause non-standard type to be aligned as the closest C type with a size that fits:


// 32-bit architecture, using a 3-byte type:
btools.pack('32@tbhb', [1, 2, 3, 4]);
// will output [1, 0, 0, 0,   2, 0, 3, 0,   4, 0, 0, 0]
// the 3-byte 't' type was padded to be aligned in a 4-byte boundary

// 64-bit architecture:
btools.pack('64@tbhb', [1, 2, 3, 4]);
// will output [1, 0, 0, 0, 2, 0, 3, 0,   4, 0, 0, 0, 0, 0, 0, 0]

// Set width to 8-bit will just enforce data to fit in standard sizes;
// no alignment is done, but the 3-byte type is padded to fit 4 bytes
btools.pack('8@tbhb', [1, 2, 3, 4]);
btools.pack('@tbhb', [1, 2, 3, 4]); // no value for @, the same as 8@
// will both output [1, 0, 0, 0,   2,   3, 0,   4]

// Without operator, types are packed with no padding at all and
// no sizes are changed:
btools.pack('tbhb', [1, 2, 3, 4]);
btools.pack('~tbhb', [1, 2, 3, 4]); // the ~ operator is the same as no operator
// will both output [1, 0, 0,   2,   3, 0,   4]

Note that even if the alignment of non-standard types is adjusted, their sizes wont change. When using those non-standard types it is assumed that applications consuming the data agree on the standards in use when reading and writing the non-standard types.

Writing and reading

The @ and # operators can be used with writing and reading functions, too; in that case they will enforce that non-standard types are aligned as standard types when writing, and will consider the standard type alignment when reading:


// Writing an array of values of a 3-byte type:
btools.write('@t', [1, 2, 3, 4]);
// will output [1, 0, 0, 0,   2, 0, 0, 0,   3, 0, 0, 0,   4, 0, 0, 0]
// the 3-byte 't' type was padded to be aligned in a 4-byte boundary

// Reading an array of values of a 3-byte type:
btools.read('@t', [1, 0, 0, 0,   2, 0, 0, 0,   2, 0, 0, 0,   4, 0, 0, 0]);
// will output [1, 2, 3, 4]
// the padding was considered when reading a 3-byte type
Alignement

With the @ operator data will be aligned according to these standards:

  • single-byte types will be 1-byte aligned
  • two-byte types will be 2-bytes aligned
  • four-byte types will be 4-bytes aligned
  • eight-byte types will be 4-bytes aligned on 32-bit architectures and 8-bytes aligned on 64-bit architectures
  • types that use more than 8 bytes will be aligned by the nearest multiple of 8

Examples:


// Packing a struct with 3 values: the first uses
// 1 byte, the second uses 2 bytes and the third
// uses 8 bytes. Aligning as 32-bit:
btools.pack('32@bhd', [1, 2, 3]);
// will output [1, 0, 2, 0,   0, 0, 0, 0,   0, 0, 8, 64]
// The 8-byte type 'd' alignment is 4 bytes.

// The same data, now 64-bit:
btools.pack('64@bhd', [1, 2, 3]);
// will output [1, 0, 2, 0, 0, 0, 0, 0,   0, 0, 0, 0, 0, 0, 8, 64]
// the padding was considered when reading a 3-byte type
// The 8-byte type 'd' alignment is 8 bytes

With the # operator data will be aligned following the same rules used for the @ operator except for the following case:

  • 8-byte types will be 8-bytes aligned in both 32-bit and 64-bit

Examples:


// 8-byte types are 8-byte aligned in both 32 and 64-bit:

btools.pack('32#Id', [1, 1.1]);
// will output [1, 0, 0, 0, 0, 0, 0, 0,  154, 153, 153, 153, 153, 153, 241, 63]

btools.pack('64#Id', [1, 1.1]);
// will output [1, 0, 0, 0, 0, 0, 0, 0,  154, 153, 153, 153, 153, 153, 241, 63]
Sizes

With the @ operator sizes will be adjusted according to these standards:

  • Types 'l' and 'L' will use 8 bytes for any architecture equal or greater than 64-bit, 4 bytes otherwise. The default size for 'l' and 'L' is 4 bytes. Note that this cause the JavaScript type used for 'l' and 'L' to change to BigInt.

With the # operator sizes will be adjusted according to these standards:

  • Types 'l' and 'L' will use 4 bytes for any architecture.

The types 'i' and 'I' always uses 4 bytes regardless of architecture.

As of version 1.12.x, only types 'l' and 'L' have sizes that may change according to the architecture.

The byte order operators

Byte order operators indicate the byte order of the data, big endian or little indian.

  • '' - no operator; little endian, UTF-16 and UTF-32 endianness determined by BOM
  • '=' - little endian, UTF-16 and UTF-32 endianness determined by BOM
  • '<' - little endian
  • '>' - big endian
  • '!' - big endian, UTF-16 and UTF-32 endianness determined by BOM

Byte order operators may be used alone or with alignment operators. If using alignment operators then the alignment operator must always come first in the string:


// Using > to indicate big endian
btools.write('@>t', [1, 2, 3, 4]);
// will output [0, 0, 1, 0,   0, 0, 2, 0,   0, 0, 3, 0,   0, 0, 4, 0]
The standard type codes:

The types below can be easily maped to C types, despite some implementation details:

Name Type Size
x Pad byte 1 byte
c single ASCII character 1 byte
C single ISO-8859-1 character 1 byte
s ASCII string 1 byte per character
S ISO-8859-1 string 1 byte per character
b Signed 8-bit integers 1 byte
B Unsigned 8-bit integers 1 byte
? Boolean 1 byte
h Signed 16-bit integers 2 bytes
H Unsigned 16-bit integers 2 bytes
i Signed 32-bit integers 4 bytes
I Unsigned 32-bit integers 4 bytes
l Signed 32-bit integers either 4 or 8 bytes
L Unsigned 32-bit integers either 4 or 8 bytes
f 32-bit Single-precision floating-point numbers 4 bytes
d 64-bit Double-precision floating-point numbers 8 bytes
q 64-bit signed integers 8 bytes
Q 64-bit unsigned integers 8 bytes
The non-standard type codes:

The types below do not map to C primitives; they are defined here for convenience.

Name Type Size
u UTF-8 string 1 to 4 bytes per character
U UTF-16 string 2 or 4 bytes per character
V UTF-32 string 4 bytes per character
N Unsigned 4-bit integers 1 byte per pair
e 16-bit Half-precision floating-point numbers 2 bytes
g 16-bit Brain floating point numbers 2 bytes
t Signed 24-bit integers 3 bytes
T Unsigned 24-bit integers 3 bytes
j Signed 40-bit integers 5 bytes
J Unsigned 40-bit integers 5 bytes
k Signed 48-bit integers 6 bytes
K Unsigned 48-bit integers 6 bytes
A 128-bit unsigned integers 16 bytes
Y 256-bit unsigned integers 32 bytes
O 512-bit unsigned integers 64 bytes
M 1024-bit unsigned integers 128 bytes

A format character may be preceded by an integral repeat count. For example, the format string '4h' means exactly the same as 'hhhh'.

In versions prior to 1.4.2 repeat count numbers for padding bytes in the format string do not work properly. You should not use repeat count numbers for padding bytes in any version prior to 1.4.2 - so instead of 'h2xl' you should use 'hxxl'. This is fixed in 1.4.2.

Repeat count numbers are just used for packing and unpacking functions pack(), unpack(), packTo() and unpackTo(). The reading functions read() and readTo() use optional start and end params to specify a slice of the input buffer for reading, and writing functions write() and writeTo() always write the entire input array of values.

Notice that while this module use some names and conventions based on Python's struct module, it is not a JavaScript re-implementation of Python's struct module, neither intends to be. Some types (like 'c') work in a different way than they do in Python, for example.

Some considerations:

  • pad bytes 'x' are always packed as 0 and ignored when unpacking
  • boolean '?' always pack values as either 0 or 1, and unpacks as either true or false
  • using 0 as a repeat count in the format string is ignored ('0h' is the same as '1h' wich is the same as 'h').
  • repeat counts placed at the end of the string will be ignored
  • the types 'c' and 'C' represent single characters. A repeat count number for types 'c' and 'C' would represent the number of single characters in sequence - the format string '5c' on unpack [97,97,97,97,97] will return ['a','a','a','a','a'].
  • the types 's' and 'S' represent strings. A repeat count number for types 's' and 'S' would represent the number of characters in the string - the format string '5s' on unpack [97,97,97,97,97] will return 'aaaaa'.
  • when packing or unpacking strings of single-byte characters (types 's' and 'S'), each code in the format string represent a independent string. The string size is always determined by the repeat count number and sequential 's' or 'S' characters mean independent strings.
  • when packing, strings are padded with null bytes as appropriate to make it fit.
  • Integers greater than 53-bit (like types q and Q) only work in environments that support BigInt.

All type codes map to objects like this:


const int32 = {
  bits: 32, // number of bits used by the number, required
  signed: true // optional, true for signed integers, default is false
  // , ...
};

You may define your own integer types:


// Define a 12-bit signed integer
btools.types.z = { bits: 12, signed: true };
// Use your code just like you would use a default one:
btools.pack('z', 257);

Types with bits=0 and bits=1 are reserved for pad bytes 'x' and booleans '?', respectively.

Byte order

To set the byte order to big endian, use the '>' or the '!' operator:


let packed = pack('>HH', [1, 1]);
// will return [0, 1, 0, 1]

To set to little endian, use the '<' or the '=' operator:


let packed = pack('<HH', [1, 1]);
// will return [1, 0, 1, 0]

The byte order operators '=' and '<' and the byte order operators '!' and '>' work exactly the same for all types except UTF-16 and UTF-32 strings.

For UTF-16 and UTF-32 strings, '<' and '>' enforce endianness regardless of a BOM (and cause errors to be throw if BOM is present and contradicts the endianness defined in the format string), while '!' and '=' assume big-endian and little-endian, respectively, but are overriden by the byte order mark (BOM) if a BOM is present on the string.

If using '<' or '>' with UTF-16 or UTF-32, a BOM will be necessarily included in the string when packing in case a BOM is not already present; operators '!' and '=' do not automatically include a BOM when packing UTF-16 or UTF-32.

The byte order operator may be ommited and will default to '=', wich is little-endian regardless of the endianness of the host machine:


let packed = pack('H', 1);
// will return [1, 0] the same as
// pack('=H', 1)
// or
// pack('<H', 1);

Byte order operators placed anywhere except the start of the format string will be ignored.

Pad bytes

Pad bytes (type 'x'), when present in the format string, dont need (and should not have) a matching element in the array of values. Pad bytes are packed as 0. Pad bytes are only used with packing and unpacking functions. They should not be used with writing and reading functions.

In versions prior to 1.4.2 repeat count numbers for padding bytes in the format string do not work properly. You should not use repeat count numbers for padding bytes in any version prior to 1.4.2 - so instead of 'h2xl' you should use 'hxxl'. This is fixed in 1.4.2.


Bytes as hex strings

To work with bytes as hex strings, use hexToByes() and bytesToHex() to format your inputs and outputs:


btools.hexToBytes('ffff00');
// return [255, 255, 0]
btools.bytesToHex([255, 255, 0]);
// return 'ffff00'

hexToBytes() always return a Array. bytesToHex() input may be an Array or a typed array, and always return a string.

Byte arrays with invalid values will cause a RangeError to be thrown:


btools.bytesToHex([-1, 256, 0]);
// throw a RangeError

Invalid hex strings will throw errors:


btools.hexToBytes('fffff'); // odd number of characters
// throw a 'Invalid hex string' Error 

btools.hexToBytes('ffffxf'); // invalid character
// throw a 'Invalid hex string' RangeError

pack and packTo

pack(format, values) will return a Array with the bytes of the passed value or values. values can be a single item or an array.


let packed = pack('h', 1000);
// return [232, 3]
packed = pack('>hh', [1000,1]);
//return [3, 232, 0, 1]

packTo(format, values, buffer, index) will write the bytes of the value to a buffer (any Array-like object). Writing starts on index. If no index is informed, it is assumed index=0.


// Create a Uint8Array with size=4
let buffer = new Uint8Array(4);
// Start writing on index=2, writing the bytes of the 16-bit value to
// buffer[2] and buffer[3]. buffer[0] and buffer[1] are left untouched
packTo('h', 402, buffer, 2);

index can be ommited and will default to zero:


// Create a Uint8Array with size=4
let buffer = new Uint8Array(4);
// Start writing on index=0, writing the bytes of the 16-bit value to
// buffer[0] and buffer[1]. buffer[2] and buffer[3] are left untouched
packTo('h', 402, buffer);

If the output buffer size is smaller than required by the data and output is a typed array, it throws a Bad buffer length error:


// Create a Uint8Array with size=1
let buffer = new Uint8Array(1);
packTo('h', 402, buffer);
//Error: Bad buffer length

write and writeTo

write(format, values) will return a Array with the bytes of the passed values. values can be a single item or an array.


let packed = write('h', [1000, 1, 1, 1]);
// return [232, 3, 1, 0, 1, 0, 1, 0]

writeTo(format, values, buffer, index) will write the bytes to the provided buffer (any Array-like object). Writing starts on index. If index is ommited, it is assumed index=0.


// Create a Uint8Array with size=4
let buffer = new Uint8Array(4);
// Write all values to the buffer
writeTo('h', [402, 1], buffer);

write(), writeTo, read() and readTo() should be used with a single type at a time and are meant to read/write long sequences where all values are of the same type (such as in media files).

read/write functions work with a single type at a time. Any extra types defined in the format string after the first one will be ignored ('>Hbb' will be handled as '>H')

If the output buffer size is smaller than required by the data and output is a typed array, it throws a Bad buffer length error:


// Create a Uint8Array with size=2, but values
// need size=4
let buffer_ = new Uint8Array(2);
writeTo('h', [1, 1], buffer_);
//Error: Bad buffer length

// Create a Uint8Array with size=1, but value uses
// 2 bytes
var buffer_ = new Uint8Array(1);
btools.writeTo('h', [1], buffer_);
//Error: Bad buffer length

// Size match the number of values and type size;
// this is OK
var buffer_ = new Uint8Array(2);
btools.writeTo('h', [1], buffer_);
// buffer_ is now [1, 0]

Packing null, false, true and undefined

Attempts to pack or write integers with the following values:

  • undefined
  • null
  • true
  • false

will throw a TypeError.

If you wish, for example, to pack 1 for true and 0 for false and null you should either use the boolean ('?') type or change your input array. Notice that the boolean type will cause any value to become either true or false when packing/unpacking.


Unpacking and input buffer length

When unpacking values from a byte buffer insufficient bytes will not write anything on the output array by default.

You can unpack in safe mode by setting the optional safe param to true. In safe mode insufficient bytes in the input array cause a 'Bad buffer length' error to be thrown:


// Do not throw error; return empty array
let buffer = [0xff];
btools.unpack('H', buffer, 0, false);

// throws a 'Bad buffer length' error
let buffer = [0xff];
btools.unpack('H', buffer, 0, true);

// throws a 'Bad buffer length' error (start reading on index=2,
// attempt to unpack a 16-bit number from a single byte)
let buffer = [0xff, 0xff, 0xff];
btools.unpack('H', buffer, 2, true);

// do not throw error (start reading on index=1,
// so skip the first byte and only read the last 2 bytes as a 16-bit number)
let buffer = [0xff, 0xff, 0xff];
btools.unpack('H', buffer, 1, true); 

For the read and readTo method, as the entire input buffer is expected to be unpacked, extra bytes in the input array will also throw an error in safe mode. If safe is set to false, the extra bytes in the end of the input buffer will be ignored.


// readTo()

// throws a 'Bad buffer length' error (insufficient bytes)
let buffer = [0xff];
btools.readTo('H', buffer, output, 0, buffer.length, true);

// Do not throw error; write nothing to output array
let buffer = [0xff];
btools.readTo('H', buffer, output, 0, buffer.length, false);

// throws a 'Bad buffer length' error; extra byte in the end of input buffer
let buffer = [0xff, 0xff, 0xff];
btools.readTo('H', buffer, output, 0, buffer.length, true);

// read()

// throws a 'Bad buffer length' error (insufficient bytes)
let buffer = [0xff];
btools.read('H', buffer, 0, buffer.length, true);

// Do not throw error; write nothing to output array
let buffer = [0xff];
btools.read('H', buffer, 0, buffer.length, false);

// throws a 'Bad buffer length' error; extra byte in the end of input buffer
let buffer = [0xff, 0xff, 0xff];
btools.read('H', buffer, 0, buffer.length, true);

Floating-point numbers

  • Floating-point numbers follow the IEEE 754 standard.
  • NaN is packed as quiet NaN. Both quiet NaN and signaling NaN can be unpacked, both unpacked as NaN. Unpacking NaN with extra information on the significand is supported and will also result in NaN (extra information will be lost).
  • Support packing and unpacking negative zeros.
  • Support packing and unpacking Infinity and negative Infinity
Minifloats

Native support for 16-bit half-precision numbers (format code 'e') and for brain floating point numbers (format code 'g').


Integers

  • Overflow on integers will throw a RangeError.
  • Packing values other than integers will throw a TypeError.
  • You may clamp the input to avoid RangeError by setting clamp to true.

To clamp integers on overflow and avoid RangeError, set the optional clamp param to true:


// Set clamp to true; values will be packed as their max or min values
// on overflow. In this case, packing an array of unsigned 8-bit ints
write('B', [1, 259, 2], true);
// will return [1, 255, 2]

// Set clamp to false; overflows cause a RangeError
write('B', [1, 259, 2], false);
// will throw a RangeError; this is the same as
write('B', [1, 259, 2]); // (omitting the clamp param)
// wich will also throw a RangeError
Signed integers

Signed integers are two's complement.

64-bit, 128-bit, 256-bit, 512-bit and 1024-bit integers

binary-tools have native support for packing and unpacking 64-bit, 128-bit, 256-bit, 512-bit and 1024-bit integers on environments that support BigInt. Other types are not affected; if you are not using BigInts, you can use this module in any enviroment or browser with no polyfills or extra code needed.

64-bit numbers are available as signed (type 'q') and unsigned (type 'Q'). 128, 256, 512 and 1024-bit numbers are only available as unsgined (types 'A', 'Y', 'O', 'M') as they are usually always unsigned. You can define signed variations for them if for some reason you need this kind of feature:


btools.types.a = { bits: 128, signed: true };
btools.pack('a', [-1n]);
// will output [255, 255, 255, 255, 255, 255, 255, 255,
//              255, 255, 255, 255, 255, 255, 255, 255]

Internally, all BigInts are represented using BigInt(), not 0n notation.

Nibbles

With the standard type 'N' (nibbles, 4-bit integers) values will be packed and unpacked as pairs, each pair in a single byte. You may use nibbles in the format string just like any type:


// packing a signed 16-bit integer and 2 nibbles
btools.pack('hNN', [1, 15, 15]);
// will return [1, 0,   255] (1, 0 is the integer, 255 is the nibble pair)

// unpacking a signed 16-bit integer and 2 nibbles
btools.unpack('hNN', [1, 0,   255]);
// will return [1, 15, 15]
Packing and unpacking Nibbles

You may pack or unpack a single nibble, or odd numbers of nibbles, but one nibble will occupy a full byte too, just like a pair would:


// packing a single nibble will only write the high word.
// 15 (max value of a nibble) is used to better illustrate the result:
btools.pack('N', [15]);
// will return [240] (240 = 11110000)

// packing an odd number of nibbles
btools.pack('NNN', [15, 15,   15]);
// will return [255, 240] (255 is the first pair, 240 is the last lonely nibble)

Values of type 'N' are unsigned. The max value for a nibble is 15; any value greater than that will cause a RangeError:


btools.pack('N', [16])
// will throw a RangeError

Note that due to the way nibbles are represented as bytes there is no RangeError when unpacking a nibble, since both high nibble and low nibble will always have 4 bits each.

Reading and writing Nibbles

Single nibbles or odd number of nibbles are only available for packing and unpacking functions. When using the writing functions write and writeTo only pairs of nibbles can be written; if the input array have a odd count of elements, an error will be thrown:


btools.write('N', [15, 0,   15, 0,   15]);
// will throw an error

var buff = new Uint8Array(3)
btools.writeTo('N', [1, 0,   1, 0,   15], buff);
// will throw an error

var buff = new Uint8Array(3);
btools.writeTo('N', [1, 0,   1, 0,   15, 0], buff);
// this is fine

When using the reading functions read and readTo, nibbles will always be unpacked as pairs:


btools.read('N', [16]);
// will return [1, 0]

btools.read('N', [240]);
// will return [15, 0]

Single-byte characters and strings

Types 'c' and 'C' represent single characters. Type 'c' represent a single ASCII character (from 0 to 127). Type 'C' represent a single ISO-8859-1 character (from 0 to 255), covering all first 256 characters of Unicode. Both always use one byte.

Types 's' and 'S' represent strings. Type 's' represent a string of ASCII characters, and type 'S' represent a string of ISO-8859-1 characters. The strings use one byte per character.

When packing or unpacking strings of single-byte characters (types 's' and 'S'), each code in the format string represent a independent string. The string size is always determined by the repeat count number and sequential 's' or 'S' characters mean independent strings.

Unless you need to enforce that the characters are 7-bit ASCII characters, you should generally use types 'C' and 'S'.

Single characters (types 'c' and 'C') behave the same way as the numeric types when it comes to format strings:


btools.pack('cH2c', ['a', 1, 'b', 'b']);
// will output [97, 1, 0, 98, 98]

For string types, the repeat count number represent the size of the string in bytes. Since every character uses exactly one byte, it is also the number of characters:


btools.pack('sH2s', ['a', 1, 'bb']);
// will output [97, 1, 0, 98, 98]

If a string have less characters than the specified in the format, it will be padded with null bytes to fit the size defined in the format string:


btools.pack('sH4s', ['a', 1, 'bb']);
// will output [97, 1, 0, 98, 98, 0, 0]

If a string have more characters than the specified in the format, it will be trimmed to fit the size defined in the format string:


btools.pack('sH2s', ['a', 1, 'bbbb']);
// will output [97, 1, 0, 98, 98]

Unicode strings

Unicode strings (types 'u', 'U' and 'V') are normally used with reading/writing functions, but may be used with packing/unpacking functions as well.

For Unicode strings the repeat count number on the format string when using pack(), packTo(), unpack() or unpackTo() represent the number of bytes that will be encoded or decoded, not the number of characters.

You must take in consideration that in Unicode a character may use more than one byte, so simply counting the number of character will not give you the correct number of bytes needed to encode the string; there are functions to calculate the number of bytes for a given Unicode string so you can check if it fits or not in the buffer. Read more about them below.

Repeat count numbers are mainly used with Unicode when you have a fixed space for a string on a buffer, but the string may or may not use all the bytes reserved for it. In that case, the string should be packed and the remaining bytes should be set to NULL (0). On the other hand, if the string uses more byte than defined in the format string, it will be trimmed as appropriate to make it fit - the same behavior that is expected from strings that use single-byte character encodings.

For example, if a UTF-8 string occupies 4 bytes when encoded, but there is a slot of 256 bytes for a UTF-8 string in a file header, the 4 bytes of the encoded string would be written to the buffer while the remaining 252 bytes would be written as NULL, and the index of the writing head will be moved to the next index immediatly after the UTF-8 pre-defined slot to continue packing the next values as defined in the format string.

UTF-8 strings

UTF-8 strings (type 'u') may use from 1 to 4 bytes per character. If there is no room for a full character to be packed according to the given size (for example, the repeat count number is 5, but the string uses 6 bytes, with the last character using 3 bytes), a error will be thrown:


// The string '美麗' uses 6 bytes and the slot have 6 bytes,
// so the string can be correcly packed:
btools.pack('6u', '美麗');
// will return [231, 190, 142,   233, 186, 151]

// The string uses 6 bytes, 3 for each character, but the slot only
// have 5 bytes; the last code point will be trimmed and any remaining
// space will be filled with NULL bytes:
btools.pack('5u', '美麗');
// will return [231, 190, 142,   0, 0]

If the slot size is greater than the string, then null bytes will be used to fit the size:


// The string '美麗' uses 6 bytes and the slot have 8 bytes,
// so the string can be correcly packed and remaining slots
// will be written as NULL
btools.pack('8u', '美麗');
// will return [231, 190, 142,   233, 186, 151,   0, 0]

If using packTo() with typed arrays, in case the typed array size is not enough to fit all bytes as defined by the repeat count number, a Bad buffer length error will be thrown:


var byteBuffer = new Uint8Array(5);
btools.packTo('6u', ['美麗'], byteBuffer);
// will throw a Bad buffer length error

When unpacking UTF-8, if the last bytes according to the size defined in the format string are not enough to read a complete character, it will be considered a invalid character and the replacement character will be added to the string:


// size dont reach the last byte of the last character
btools.unpack('5u', [231, 190, 142,   233, 186, 151]);
// will return ['美�']

// size match the 2 characters
btools.unpack('6u', [231, 190, 142,   233, 186, 151]);
// will return ['美麗']

If the slot is greater than the string, then the Unicode character 'NULL' (U+0000) will be included in the resulting string for every null byte present in the slot:


// Reading a slot of 10 bytes, but the string encoded at the slot
// only uses the first 6 bytes:
btools.unpack('10u', [231, 190, 142,   233, 186, 151,   0, 0, 0, 0]);
// will return ['美麗\u0000\u0000\u0000\u0000']
Finding out the buffer size for a UTF-8 string

To find out how many bytes are needed for a given UTF-8 string, use the utf8BufferSize() method:


btools.utf8BufferSize('Hello, world!');
// will return 13; the fact that this is the same number of
// characters in the string is a mere coincidence

btools.utf8BufferSize('Having a romantic dinner with my 👩‍❤️‍👨.');
// will return 54 - the 👩‍❤️‍👨 emoji alone uses 20 bytes.
// It is actually 6 code points, 3 emojis joined by
// Zero With Joiners that form a single emoji on most fonts.
UTF-8 strings with write() and read()

UTF-8 strings are more commonly used with writing and reading functions; using them with writing and reading functions is also far simpler.

When using type 'u' with the write() function, the full string is always encoded regardless of its size:


btools.write('u', 'Hello, world!');
// will return
// [72, 101, 108, 108, 111,   44,   32,   119, 111, 114, 108, 100,   33]

When using type 'u' with the reading functions read() and readTo(), if no start index or end index are given, then the full string will be unpacked:


btools.read('u', [72, 101, 108, 108, 111,   44,   32,   119, 111, 114, 108, 100,   33]);
// will return ['Hello, world!']

Start and end indexes may be given when reading the byte buffer; in this case, only the bytes in the slice will be decoded. Other than that, the behavior is the same as using no indexes.

UTF-16 strings

The same rules used for UTF-8 also apply to UTF-16 strings (type 'U') - the differences are that a character may use 2 or 4 bytes and that UTF-16 also have the notion of endianness, while UTF-8 does not. This adds the restriction that only even byte counts may be used in the format string for packing or unpacking, and that only arrays or array slices with even byte counts can be used for reading.


// The string '慈愛' uses 4 bytes, 2 bytes for each character;
// since the format defines 4 bytes, it can be correctly packed:
btools.pack('4U', '慈愛');
// will return [72, 97,   27, 97]

// Uneven byte len; this size is not valid for UTF-16
// and will cause an error to be thrown
btools.pack('3U', '慈愛');
// will throw a Bad buffer length error

// The format string says 2 bytes, but the string needs 4 bytes
// to be encoded, 2 bytes per character; only the first character
// will be encoded.
btools.pack('2U', '慈愛');
// will return [72, 97]

// The format string says 6 bytes, but the string needs 8 bytes
// to be encoded, 4 bytes per character; only the first character
// will be encoded and remaining bytes will be filled with NULL.
btools.pack('!6U', '😀😀');
// will return [216, 61, 222, 0,   0, 0]

// The format string says 6 bytes, but the string only need 4;
// the remainig bytes will be written as NULL
btools.pack('6U', '慈愛')
// will return [72, 97,   27, 97,   0, 0]

If using packTo() or writeTo() with typed arrays, in case the typed array size is not enough to fit the size defined in the format string, a Bad buffer length error will be thrown:


var byteBuffer = new Uint8Array(2);
btools.packTo('4U', ['慈愛'], byteBuffer);
// will throw a Bad buffer length error

var byteBuffer = new Uint8Array(2);
btools.writeTo('2U', ['慈愛'], byteBuffer);
// will throw a Bad buffer length error

If the size is greater than the string, then null bytes will be used to fit the size:


btools.pack('6U', '慈愛');
// will return [72, 97,   27, 97,   0, 0]

When unpacking UTF-16, if the last character uses more bytes than the size defined in the format string, it will be treated as a invalid character and the replacement character will be used in the resulting string:


// size dont reach the last byte of the last character
// last character is 😀, which uses 4 bytes
btools.unpack('4U', [72, 97,   61, 216, 0, 222]);
// will return ['慈�']

// size match the 2 characters
btools.unpack('6U', [72, 97,   61, 216, 0, 222]);
// will return ['慈😀']

If the slot is greater than the string, then the Unicode character 'NULL' (U+0000) will be included in the resulting string for every null code point present in the slot:


// Reading a slot of 6 bytes, but the string encoded at the slot
// only uses the first 4 bytes; NULL characters will be included.
// In this case a single NULL character, since the remaining 2 bytes
// are a single code point
btools.unpack('6U', [72, 97,   27, 97,   0, 0]);
// will return ['慈愛\u0000']

When unpacking UTF-16, the byte count in the format string must always even, since every character uses either 2 or 4 bytes:


// Uneven byte count; will throw an error
btools.unpack('5U', [72, 97,   27, 97,   0, 0]);
// Will throw a Error
UTF-16 and endianness

UTF-16 have the notion of endianness, so byte order operators in the format string will affect how they are packed and unpacked.

When using the operators '<' or '>' to enforce endianness, if no BOM is present on the string, a BOM will be automatically added and this must be considered in the repeat count number:


// Type is '<' and string do not have a BOM; Bom will be added.
// Even if the characters only use 4 bytes, 6 bytes must be defined
// in the repeat count to make room for the BOM
btools.pack('<6U', '慈愛');
// will return [255, 254,   72, 97,   27, 97]

// If the repeat count number do not include space for the BOM,
// and there is not enough room for the last character, only
// the BOM and the first character will be encoded
btools.pack('<4U', '慈愛');
// will return [255, 254,   72, 97]

When using types '!' or '=' to represent endianness, no BOM will be added, so the repeat count number only need to consider the number of bytes used by the characters:


// Use '=' operator to represent little endian; no BOM will be included
btools.pack('=4U', '慈愛');
// will return [72, 97,   27, 97]

// Use '!' operator to represent big endian; no BOM will be included
btools.pack('!4U', '慈愛');
// will return [97, 72,   97, 27]

// Use '=' operator to represent little endian with a string that have a BOM
// In this case, the BOM will be packed like any other character. Notice that,
// like any other character, the BOM must be accounted for in the repeat count:
btools.pack('=6U', '\ufeff慈愛');
// will return [255, 254,   72, 97,   27, 97]

To find out how many bytes are needed for a given UTF-16 string, use the utf16BufferSize() function.


let buffer_ = new Uint8Array(
    btools.utf16BufferSize('Rafael is ❤️‍🔥 about his work!'));
btools.writeTo('U', 'Rafael is ❤️‍🔥 about his work!', buffer_);
let unpacked = btools.read('U', buffer_);
// unpacked = ['Rafael is ❤️‍🔥 about his work!']
Finding out the buffer size for a UTF-16 string

To find out how many bytes are needed for a UTF-16 string, use the utf16BufferSize() method:


btools.utf16BufferSize('Hello, world!');
// will return 26

The method accept a optional parameter forceBOM to indicate that a BOM should be considered in the size even if the original string does not have a BOM:


btools.utf16BufferSize('Hello, world!');
// will return 26

// foceBOM set to true, and there is no BOM in the string;
// in this case utf16BufferSize() will return the number of
// bytes needed to encode the string + 2 extra bytes to make
// room for the BOM:
btools.utf16BufferSize('Hello, world!', true); 
// will return 28; 26 bytes for characters + 2 bytes for the BOM

// BOM in the string, foceBOM true; in this case the
// BOM will be counted like any other character, and no
// extra bytes will be considered in the count
btools.utf16BufferSize('\ufeffHello, world!', true); 
// will also return 28

// BOM in the string, forceBOM false; in this case the
// BOM will be counted, too
btools.utf16BufferSize('\ufeffHello, world!'); 
// will also return 28
UTF-16 strings with write() and read()

When using type 'U' with write() the full string will be encoded:


btools.write('U', 'Hello, world!');
// will return
// [72, 0, 101, 0, 108, 0, 108, 0, 111, 0, 44, 0, 32, 0,
//   119, 0, 111, 0, 114, 0, 108, 0, 100, 0, 33, 0];

When using type 'U' with the reading functions read() and readTo, if no start index or end index are given, then the full string will be decoded:


btools.read('U', [72, 0, 101, 0, 108, 0, 108, 0, 111, 0, 44, 0, 32, 0,
    119, 0, 111, 0, 114, 0, 108, 0, 100, 0, 33, 0]);
// will return ['Hello, world!']

Start and end indexes may be given when reading the byte buffer; in this case, only the bytes in the slice defined by the indexes will be decoded. Other than that, the behavior is the same as using no indexes.

Note that when reading either entire arrays or slices of arrays, the byte count must be even or a Bad buffer length error will be thrown:


// Array does not have a even size; will throw an error
btools.read('U', [72, 97,   27, 97,  0]);
// Throws a Bad buffer length error

You may adjust the size using the start and end params to fit a valid UTF-16 buffer size.


// Adjust the size to create a valid slice
// of the buffer for UTF-16:
btools.read('U', [72, 97,   27, 97,  0], 0, 4);
['慈愛']
UTF-32 strings

The same rules used for UTF-16 also apply to UTF-32 strings (type 'V'). The main difference is that it always uses 4 bytes per code point, so the byte count must be always a multiple of 4.


// This string have 3 characters (regardless of being rendered
// as 2 characters in some fonts). Since every character uses
// 4 bytes, 12 bytes are needed to encode it:
btools.pack('12V', 'सुख');
// Will return [56, 9, 0, 0,   65, 9, 0, 0,   22, 9, 0, 0]

// Repeat count is not a multiple of 4; Not a valid count
// for UTF-32, and will cause an error to be thrown
btools.pack('10V', 'सुख');
// Will throw a Bad buffer length

// Repeat count is smaller then the number of bytes needed to encode
// the string; only the first two characters will be encoded:
btools.pack('8V', 'सुख'); 
// Will return [56, 9, 0, 0,   65, 9, 0, 0]

If using packTo() or writeTo() with typed arrays, in case the typed array size is not enough to fit the size defined in the format string, a Bad buffer length error will also be thrown:


var byteBuffer = new Uint8Array(6);
btools.packTo('20V', ['🚵🏻‍♂️'], byteBuffer);
// will throw a Bad buffer length error; this character
// needs 20 bytes, as it uses 5 code points

If the size is greater than the string, then null bytes will be used to fit the size:


btools.pack('24V', '🚵🏻‍♂️');
// will return [181, 246, 1, 0,   251, 243, 1, 0,   13, 32, 0, 0,
//              66, 38, 0, 0,   15, 254, 0, 0,   0, 0, 0, 0]

With UTF-32, the byte count in the format string must always be a multiple of 4, since every character uses always 4 bytes:


// size dont reach the last byte of the last character
btools.unpack('7V', [120, 243, 1, 0,   121, 243, 1, 0]);
// Will throw a Error

// size match all the characters
btools.unpack('8V', [120, 243, 1, 0,   121, 243, 1, 0]);
// will return ['🍸🍹']

If the slot is greater than the string, then the Unicode character 'NULL' (U+0000) will be included in the resulting string for every null byte present in the slot:


// Reading a slot of 12 bytes, but the string encoded at the slot
// only uses the first 8 bytes:
btools.unpack('12V', [120, 243, 1, 0,   121, 243, 1, 0,   0, 0, 0, 0]);
// will return ['🍸🍹\u0000']
UTF-32 and endianness

When using types '<' or '>' to enforce endianness, if no BOM is present on the string, a BOM will be automatically added and this must be considered in the repeat count number:


// Type is '<' and string do not have a BOM; BOM will be added.
// Even if the characters only uses 4 bytes each, 16 bytes must be defined
// in the repeat count to make room for the BOM
btools.pack('<16V', 'सुख');
// will return [255, 254, 0, 0,   56, 9, 0, 0,   65, 9, 0, 0,   22, 9, 0, 0]

// Number of bytes is a multiple of 4, but not enough to encode
// the full string; only the BOM and the first 2 characters will
// be encoded.
btools.pack('<12V', 'सुख');
// will return [255, 254, 0, 0,   56, 9, 0, 0,   65, 9, 0, 0]

When using types '!' or '=' to represent endianness, no BOM will be added, so the repeat count number only need to consider the number of bytes used by the characters already present in the string:


// Use '=' operator to represent little endian; no BOM will be included
// this is the same as using no operator at all
btools.pack('=12V', 'सुख');
// will return [56, 9, 0, 0,   65, 9, 0, 0,   22, 9, 0, 0]

// Use '!' operator to represent big endian; no BOM will be included
btools.pack('!12V', 'सुख');
// will return [0, 0, 9, 56,   0, 0, 9, 65,   0, 0, 9, 22]

// Use '=' operator to represent little endian with a string that have a BOM
// In this case, the BOM will be packed like any other character. Notice that,
// like any character, the size of the BOM must be considered in the repeat count:
btools.pack('=16V', '\ufeffसुख');
// will return [255, 254, 0, 0,   56, 9, 0, 0,   65, 9, 0, 0,   22, 9, 0, 0]

To determine how many bytes are needed for a given UTF-32 string, use the utf32BufferSize() function:


let buffer_ = new Uint8Array(
    btools.utf32BufferSize('Time to 🚵🏻‍♂️'));
btools.writeTo('V', 'Time to 🚵🏻‍♂️', buffer_);
let unpacked = btools.read('V', buffer_);
// unpacked = ['Time to 🚵🏻‍♂️']
Finding out the buffer size for a UTF-32 string

To find out how many bytes are needed for a UTF-32 string, use the utf32BufferSize() method:


btools.utf32BufferSize('Hello, world!');
// will return 52

The method accept a optional parametr forceBOM to indicate that a BOM should be accounted in the size even if the original string do not have a BOM:


btools.utf32BufferSize('Hello, world!');
// will return 52

// foceBOM set to true
btools.utf32BufferSize('Hello, world!', true);
// will return 56; 52 bytes for characters + 4 bytes for the BOM

// BOM in the string, foceBOM true
btools.utf32BufferSize('\ufeffHello, world!', true);
// will also return 56

// BOM in the string, forceBOM false
btools.utf32BufferSize('\ufeffHello, world!');
// will also return 56
UTF-32 strings with write() and read()

When using type 'V' with write() the full string will be encoded:


btools.write('V', 'Hello, world!');
// will return
// [72, 0, 0, 0,   101, 0, 0, 0,   108, 0, 0, 0,   108, 0, 0, 0,
//   111, 0, 0, 0,   44, 0, 0, 0,   32, 0, 0, 0,
//   119, 0, 0, 0,   111, 0, 0, 0,   114, 0, 0, 0,
//   108, 0, 0, 0,   100, 0, 0, 0,   33, 0, 0, 0];

When using type 'V' with the reading functions read() or readTo, if no start index or end index are given, then the full string will be unpacked:


btools.read('V', [72, 0, 0, 0,   101, 0, 0, 0,   108, 0, 0, 0,   108, 0, 0, 0,
    111, 0, 0, 0,   44, 0, 0, 0,   32, 0, 0, 0,
    119, 0, 0, 0,   111, 0, 0, 0,   114, 0, 0, 0,
    108, 0, 0, 0,   100, 0, 0, 0,   33, 0, 0, 0]);
// will return ['Hello, world!']

Start and end indexes may be given when reading the byte buffer; in this case, only the bytes in the slice will be decoded. Other than that, the behavior is the same as using no indexes.

Note that when reading either entire arrays or slices of arrays, the byte count must be a multiple of 4 or a Bad buffer length error will be thrown:


// Byte count is not a multiple of 4, an error will be thrown
btools.read('V', [120, 243, 1, 0,   121, 243, 1, 0,   0]);
// Will throw a Bad buffer error

You may adjust the size using the start and end params to fit a valid UTF-32 buffer size:


// Adjust the size to create a valid slice
// of the buffer for UTF-16:
btools.read('V', [120, 243, 1, 0,   121, 243, 1, 0,   0], 0, 8);
['🍸🍹']

API

Packing/unpacking functions:


/**
 * Pack values to a byte buffer.
 * @param {string} format The struct format definition.
 * @param {!Array<number|string|bigint>} values The values to pack.
 * @param {!Array<number>} buffer The byte buffer to write on.
 * @param {number=} [index=0] The buffer index to start writing.
 * @param {boolean=} [clamp=false] True to clamp ints on overflow.
 * @return {number} The next index to write.
 * @throws {Error} On unsupported type.
 * @throws {Error} If the output buffer is typed array and size is not valid.
 * @throws {RangeError} On integer overflow if clamp is set to false.
 * @throws {TypeError} If 'values' contains invalid values for the types.
 */
export function packTo(format, values, buffer, index=0, clamp=false) {}

/**
 * Unpack values from an array of bytes to an Array-like object.
 * Always start writing the output at the beginning of the output array.
 * @param {string} format The struct format definition.
 * @param {!Array<number>} buffer The byte buffer to unpack.
 * @param {!Array<number|string|bigint|boolean>} output The output array.
 * @param {number=} [index=0] The buffer index to read.
 * @param {boolean=} [safe=false] If set to false, extra bytes in the end of
 *   the input array are ignored and input buffers with insufficient bytes will
 *   write nothing to the output array. If safe is set to true the function
 *   will throw a 'Bad buffer length' error on the aforementioned cases.
 * @throws {Error} On unsupported type.
 * @throws {Error} On bad input buffer length if on safe mode.
 */
export function unpackTo(format, buffer, output, index=0, safe=false) {}

/**
 * Pack values as a array of bytes.
 * @param {string} format The struct format definition.
 * @param {!Array<number|string|bigint>} values The values to pack.
 * @param {boolean=} [clamp=false] True to clamp ints on overflow.
 * @return {!Array<number>} The packed values.
 * @throws {Error} On unsupported type.
 * @throws {RangeError} On overflow if clamp is set to false.
 * @throws {TypeError} If 'values' is not a number.
 * @throws {TypeError} If 'values' is not a int and type is int.
 */
export function pack(format, values, clamp=false) {}

/**
 * Unpack values from an array of bytes.
 * This method returns an Array even if only a single value is unpacked.
 * @param {string} format The struct format definition.
 * @param {!Array<number>} buffer The byte buffer to unpack.
 * @param {number=} [index=0] The buffer index to read.
 * @param {boolean=} [safe=false] If set to false, extra bytes in the end of
 *   the input array are ignored and input buffers with insufficient bytes will
 *   write nothing to the output array. If safe is set to true the function
 *   will throw a 'Bad buffer length' error on the aforementioned cases.
 * @return {!Array<number|string|bigint|boolean>} The unpacked values.
 * @throws {Error} On unsupported type.
 * @throws {Error} On bad input buffer length if on safe mode.
 */
export function unpack(format, buffer, index=0, safe=false) {}

Writing/reading functions to handle long sequences of the same type:


/**
 * Write a array of values to a byte buffer.
 * @param {string} format The format definition.
 * @param {!Array<number|string|bigint>} values The values to write.
 * @param {!Array<number>} buffer The buffer to write on.
 * @param {number=} [index=0] The buffer index to start writing.
 * @param {boolean=} [clamp=false] True to clamp ints on overflow.
 * @return {number} The next index to write on the buffer.
 * @throws {Error} On unsupported type.
 * @throws {Error} If the output buffer is typed array and size is not valid.
 * @throws {RangeError} On integer overflow if clamp is set to false.
 * @throws {TypeError} If 'values' contains invalid values for the types.
 */
export function writeTo(format, values, buffer, index=0, clamp=false) {}

/**
 * Read a array of values from a byte buffer to a array or a typed array.
 * @param {string} format The format definition.
 * @param {!Array<number>} buffer The byte buffer.
 * @param {!Array<number|string|bigint|boolean>} output The output array.
 * @param {number=} [start=0] The input buffer index to start reading.
 * @param {number=} [end=buffer.length] The input buffer index to stop reading.
 * @param {boolean=} [safe=false] If set to false, extra bytes in the end of
 *   the input array are ignored and input buffers with insufficient bytes will
 *   write nothing to the output array. If safe is set to true the function
 *   will throw a 'Bad buffer length' error on the aforementioned cases.
 * @throws {Error} On unsupported type.
 * @throws {Error} On bad input buffer length if on safe mode.
 */
export function readTo(
    format, buffer, output, start=0, end=buffer.length, safe=false) {}

/**
 * Write a array of values as a array of bytes.
 * @param {string} format The format definition.
 * @param {!Array<number|string|bigint>} values The values to pack.
 * @param {boolean=} [clamp=false] True to clamp ints on overflow.
 * @return {!Array<number>} The packed values.
 * @throws {Error} On unsupported type.
 * @throws {RangeError} On overflow if clamp is set to false.
 * @throws {TypeError} If 'values' is not a array of numbers.
 * @throws {TypeError} If 'values' is not a array of ints and type is int.
 */
export function write(format, values, clamp=false) {}

/**
 * Read a array of values from a byte buffer.
 * @param {string} format The format definition.
 * @param {!Array<number>} buffer The byte buffer.
 * @param {number=} [start=0] The buffer index to start reading.
 * @param {number=} [end=buffer.length] The buffer index to stop reading.
 * @param {boolean=} [safe=false] If set to false, extra bytes in the end of
 *   the input array are ignored and input buffers with insufficient bytes will
 *   write nothing to the output array. If safe is set to true the function
 *   will throw a 'Bad buffer length' error on the aforementioned cases.
 * @return {!Array<number|string|bigint|boolean>}
 * @throws {Error} On unsupported type.
 * @throws {Error} On bad input buffer length if on safe mode.
 */
export function read(
    format, buffer, start=0, end=buffer.length, safe=false) {}

Note that in version 1.x the end param for read and readTo is non-inclusive, so it must be set always as index + 1. For example, to read from array position 0 to position 7 you should read('u', buffer, 0, 8).

Tools:


/**
 * Swap the byte ordering in a buffer. The buffer is modified in place.
 * @param {!Array<number>} buffer The bytes.
 * @param {number} offset The byte offset.
 * @param {number=} [start=0] The start index.
 * @param {number=} [end=bytes.length] The end index.
 * @function endianness
 */
export function endianness(buffer, offset, start=0, end=buffer.length) {}

/**
 * Calculate the buffer size based on a format string.
 * @param {string} format The format string.
 * @param {boolean=} [includePads=false] True to include pads in the count.
 * @return {number}
 * @throws {Error} On unsupported type.
 * @function calcSize
 */
export function calcSize(format, includePads=false) {}

/**
 * Returns how many bytes are needed to serialize a UTF-8 string.
 * @param {string} str The string to pack.
 * @return {number} The number of bytes needed for the string.
 */
export function utf8BufferSize(str) {}

/**
 * Returns how many bytes are needed to serialize a UTF-16 string.
 * @param {string} str The string.
 * @param {boolean=} [forceBOM=false] If BOM should be enforced or not.
 *   If false (default), then it will only count the bytes according to
 *   the characters on the string; if true and string have no BOM, then
 *   it will count the bytes of the characters plus BOM (size + 2).
 * @return {number} The number of bytes needed for the string.
 */
export function utf16BufferSize(str, forceBOM=false) {}

/**
 * Returns how many bytes are needed to serialize a UTF-32 string.
 * @param {string} str The string.
 * @param {boolean=} [forceBOM=false] If BOM should be enforced or not.
 *   If false (default), then it will only count the bytes according to
 *   the characters on the string; if true and string have no BOM, then
 *   it will count the bytes of the characters plus BOM (size + 2).
 * @return {number} The number of bytes needed for the string.
 */
export function utf32BufferSize(str, forceBOM=false) {}

/**
 * Format a byte array as a string of hex numbers.
 * @param {!Array<number>} bytes The bytes.
 * @return {string}
 * @throws {RangeError} If bytes contains invalid values.
 */
export function bytesToHex(bytes) {};

/**
 * Convert a hex string to a byte array.
 * @param {string} hexStr The hex string.
 * @return {!Array<number>}
 * @throws {RangeError} If string contains chars outside the 0..f range.
 * @throws {Error} If string length is not even.
 */
export function hexToBytes(hexStr) {};

Note that in future versions the includePads param for calcSize will be set to true by default. This param is kept like this in 1.x releases as to not break the 1.x interface. Apart from a few situations, you should always use calcSize with includePads set to true.

Note that in version 1.x the end param for endiannness is non-inclusive, so it must be set always as index + 1.

The types object:


/**
 * A binary-tools datatype definition object.
 * @typedef {Object.<string, number|boolean>} DataType
 * @property {number} bits The number of bits used by the type.
 * @property {boolean} signed False for unsigned, true for signed.
 * @property {boolean} fp True for floating point numbers only.
 * @property {boolean} c True for single-byte character encodings.
 * @property {boolean} b Alternative type. Used only by types g and N.
 * @property {boolean} v Variable length. Only true for string types.
 * @property {boolean} u False for all except UTF strings.
 * @memberOf module:binary-tools
 */

/**
 * The standard binary-tools datatype definitions.
 * Types can be created by adding new properties named as
 * a single character to this object in the format:
 * { bits: number, signed: boolean, fp: boolean }
 * Numbers and the characters \@#~=<>! are reserved, and also
 * all characters used for pre-defined types (xcCsSuUVNbB?hHegtTlLfjJkKdqQAYOM).
 * @type {Object}
 * @property {DataType} x Pad byte
 * @property {DataType} c ASCII character
 * @property {DataType} C Extended ASCII (ISO-8859-1) character
 * @property {DataType} s ASCII string
 * @property {DataType} S Extended ASCII (ISO-8859-1) string
 * @property {DataType} u UTF-8 string
 * @property {DataType} U UTF-16 string
 * @property {DataType} V UTF-32 string
 * @property {DataType} N Unsiged 4-bit integers
 * @property {DataType} b Signed 8-bit integers
 * @property {DataType} B Unsigned 8-bit integers
 * @property {DataType} ? Boolean
 * @property {DataType} h Signed 16-bit integers
 * @property {DataType} H Unsigned 16-bit integers
 * @property {DataType} e 16-bit Half-precision floating-point numbers
 * @property {DataType} g 16-bit Brain floating point numbers
 * @property {DataType} t Signed 24-bit integers
 * @property {DataType} T Unsigned 24-bit integers
 * @property {DataType} l Signed 32-bit integers
 * @property {DataType} L Unsigned 32-bit integers
 * @property {DataType} f 32-bit Single-precision floating-point numbers
 * @property {DataType} j Signed 40-bit integers
 * @property {DataType} J Unsigned 40-bit integers
 * @property {DataType} k Signed 48-bit integers
 * @property {DataType} K Unsigned 48-bit integers
 * @property {DataType} d 64-bit Double-precision floating-point numbers
 * @property {DataType} q signed 64-bit integers
 * @property {DataType} Q Unsigned 64-bit integers
 * @property {DataType} A Unsigned 128-bit integers
 * @property {DataType} Y Unsigned 256-bit integers
 * @property {DataType} O Unsigned 512-bit integers
 * @property {DataType} M Unsigned 1024-bit integers
 * @memberOf module:binary-tools
 */
export const types = {
  'x': {bits: 0, signed: false, fp: false, c: false, b: false, v: false, u: false},
  'c': {bits: 7, signed: false, fp: false, c: true, b: false, v: false, u: false},
  'C': {bits: 8, signed: false, fp: false, c: true, b: false, v: false, u: false},
  's': {bits: 7, signed: false, fp: false, c: true, b: false, v: true, u: false},
  'S': {bits: 8, signed: false, fp: false, c: true, b: false, v: true, u: false},
  'u': {bits: 8, signed: false, fp: false, c: true, b: false, v: true, u: true},
  'U': {bits: 16, signed: false, fp: false, c: true, b: false, v: true, u: true},
  'V': {bits: 32, signed: false, fp: false, c: true, b: false, v: true, u: true},
  'N': {bits: 4, signed: false, fp: false, c: false, b: true, v: false, u: false},
  'b': {bits: 8, signed: true, fp: false, c: false, b: false, v: false, u: false},
  'B': {bits: 8, signed: false, fp: false, c: false, b: false, v: false, u: false},
  '?': {bits: 1, signed: false, fp: false, c: false, b: false, v: false, u: false},
  'h': {bits: 16, signed: true, fp: false, c: false, b: false, v: false, u: false},
  'H': {bits: 16, signed: false, fp: false, c: false, b: false, v: false, u: false},
  'e': {bits: 16, signed: true, fp: true, c: false, b: false, v: false, u: false},
  'g': {bits: 16, signed: true, fp: true, c: false, b: true, v: false, u: false},
  't': {bits: 24, signed: true, fp: false, c: false, b: false, v: false, u: false},
  'T': {bits: 24, signed: false, fp: false, c: false, b: false, v: false, u: false},
  'l': {bits: 32, signed: true, fp: false, c: false, b: false, v: false, u: false},
  'L': {bits: 32, signed: false, fp: false, c: false, b: false, v: false, u: false},
  'f': {bits: 32, signed: true, fp: true, c: false, b: false, v: false, u: false},
  'j': {bits: 40, signed: true, fp: false, c: false, b: false, v: false, u: false},
  'J': {bits: 40, signed: false, fp: false, c: false, b: false, v: false, u: false},
  'k': {bits: 48, signed: true, fp: false, c: false, b: false, v: false, u: false},
  'K': {bits: 48, signed: false, fp: false, c: false, b: false, v: false, u: false},
  'd': {bits: 64, signed: true, fp: true, c: false, b: false, v: false, u: false},
  'q': {bits: 64, signed: true, fp: false, c: false, b: false, v: false, u: false},
  'Q': {bits: 64, signed: false, fp: false, c: false, b: false, v: false, u: false},
  'A': {bits: 128, signed: false, fp: false, c: false, b: false, v: false, u: false},
  'Y': {bits: 256, signed: false, fp: false, c: false, b: false, v: false, u: false},
  'O': {bits: 512, signed: false, fp: false, c: false, b: false, v: false, u: false},
  'M': {bits: 1024, signed: false, fp: false, c: false, b: false, v: false, u: false}
};

Additional files

The TypeScript declarations are in ./index.d.ts

The TypeScript declarations for the ES3 distribution are in ./dist/binary-tools.d.ts

The Closure Compiler externs are in ./externs/binary-tools.js


Reporting security issues

Report security issues to this e-mail: [email protected].


Buying a LGPL version of this software:

You can buy a LGPL version of this software at https://rochars.com/binary-tools and use it as you please according to the terms of that license. Other licensing options are available, too. They include the complete source code, the tests, documentation and all files related to the project for your convenience.

The unpaid version of this software is released under the CC-BY-NC-ND-4.0 License. You are free to use it under the terms of the CC-BY-NC-ND-4.0 License. View COPYING for more information.


LICENSE

binary-tools: JavaScript binary tools for any browser or environment.
Copyright (C) 2023 Rafael da Silva Rocha

This software is released under the Creative Commons CC-BY-NC-ND-4.0 license (very restrictive, no commercial use). You may purchase a version of this software released under the LGPL for your freedom and convenience.

For commercial use:

  • Buy now: $10.00  | CC-BY-ND-4.0

    No sources included, licensed as CC-BY-ND-4.0 for commercial use.

Full source code:

  • Buy now: $40.00  | GPL

    Comes multi-licensed with GPL-2.0 and GPL-3.0

  • Buy now: $80.00  | LGPL

    Comes multi-licensed with LGPL-2.1, LGPL-3.0, MPL-2.0 and GPLs

Multi-licensing ensures that you can choose the license that works best for you.

Payment:

copy PayPal: [email protected] copy PIX: [email protected]

The prices are just for reference. You may use your own currency to pay me as long as the value is similar to the reference price. I accept some cryptocurrencies. If you have any doubt or are having trouble with the payment methods listed above, email me at [email protected] and we will find a solution. You don't need to worry about the precise exchange rate for the day :)

Please contact me at [email protected] informing your payment method and date. You don't have to wait for a response to make your purchase, just let me know what you're buying and the origin of the transfer.

You will receive a download link as soon as I confirm your payment.

Licenses apply to specific versions. That said, after your first purchase, I'll be happy to send you any updates at your request, free of charge.

Last reviewed in 2023-09-23


Talk to me: [email protected]

I may be available to work on your project.