It contains functions from both "string" and "utf8" standard libraries. "utf8string.codepoint" is aliased to "utf8string.code" for convenience, this is what you should use to get character code instead of "utf8string.byte" (because it fetches specified byte, not text character). "uft8.bytelen" function added to get string length in bytes, not in text characters. Uppercase and lowercase conversion works with just about every character system that has character casing (for example japanese writing doesn't have casing, neither do emojis). In addition to "uft8string.charpattern" there are "uft8string.upperpattern" and "uft8string.lowerpattern" that contain exhaustive list of uppercase and lowercase characters.
*on assumption that you're using normalized Unicode strings with no composite characters
**message or email me about any instance where utf8string library doesn't works the same way as string standard library
String handling functions will readily break when using composite Unicode characters because they can't tell what constitutes a complete, full character.
Pattern matching will break when mixing Unicode characters with pattern classes and items:
Code: Select all
[ ] * + - ? . %b