
We only want bytes strings and not “wide” unicode code point notation.
This helps give consistency, clarity, and simplicity.
Text::Extract::MaketextCallPhrases will handle it correctly for perl notation but what if you’re not parsing perl code?
Of course, using unicode strings when you need to operate under character semantics is the appropriate thing to do and newer perls have really great tools for that.
However, for localization we are essentially looking up and passing through without examination or collation modifications. So bytes is the way to go for phrases!
You get garbled data when output to browser, file, database, or terminal.
Various hashing and encrypting operate on bytes (using a unicode string can be fatal or you silently get unexpected data).
Solution: You can simply use the character itself or a bracket notation method for the handful of markup related or visually special characters

If you get false positives then that only goes to help highlight how ambiguity adds to the reason to avoid non-bytes strings!
Note that HTML Entities are not addressed here since the unicode notation as well as other syntax is covered via Ampersand.
This means you have something like \x{NNNN} and need to use the character itself instead.
These will be turned into ‘[comment,non bytes unicode string “\x{NNNN}”]’ (where NNNN is the Unicode code point) so you can find them visually.
This means you have something like \N{…} and need to use the character itself instead.
These will be turned into ‘[comment,charnames.pm type string “\N{…}”]’ so you can find them visually.
This means you have something like \uNNNN and need to use the character itself instead.
These will be turned into ‘[comment,unicode notation “\uNNNN”]’ (where NNNN is the Unicode code point) so you can find them visually.
This means you have something like U'NNNN' and need to use the character itself instead.
These will be turned into ‘[comment,unicode notation “U'NNNN'”]’ (where NNNN is the Unicode code point) so you can find them visually.
This means you have something like U+NNNN and need to use the character itself instead.
These will be turned into ‘[comment,non bytes unicode string “U+NNNN]’ (where NNNN is the Unicode code point) so you can find them visually.
This means you have something like UxNNNN and need to use the character itself instead.
These will be turned into ‘[comment,non bytes unicode string “UxNNNN]’ (where NNNN is the Unicode code point) so you can find them visually.
This means you have something like u"\uNNNN" and need to use the character itself instead.
These will be turned into ‘[comment,non bytes unicode string “u"\uNNNN"”]’ (where NNNN is the Unicode code point) so you can find them visually.

None