qbits.thorn
alias+category
(alias+category chr)Retrieves the script block alias and unicode category for a unicode character.
confusables
(confusables s preferred-aliases)(confusables s)Checks if s contains characters which might be confusable with characters from preferred-aliases.
This returns a lazy-seq so you’re free to check only the first char or the whole string.
preferred-aliases can take an set of unicode block aliases to be considered as your ‘base’ unicode blocks:
- considering
paρa,- with
preferred-aliases#{"LATIN"}, the 3rd characterρwould be returned because this greek letter can be confused with latinp. - with
preferred-aliases#{"greek"}, the 1st characterpwould be returned because this latin letter can be confused with greekρ. - without a
preferred-aliases, you’ll discover the 29 characters that can be confused withp, the 23 characters that look likea, and the one that looks likeρ(which is, of course, p aka LATIN SMALL LETTER P).
- with
dangerous?
(dangerous? s preferred-aliases)(dangerous? s)Checks if s can be dangerous, i.e. is it not only mixed-scripts but also contains characters from other scripts than the ones in preferred-aliases that might be confusable with characters from scripts in preferred-aliases
mixed-script?
(mixed-script? s allowed-aliases)(mixed-script? s)Checks if s contains mixed-scripts content, excluding script blocks aliases in allowed-aliases. E.g. B. C is not considered mixed-scripts by default: it contains characters from Latin and Common, but Common is excluded by default.
unique-aliases
(unique-aliases s)Retrieves all unique script block aliases used in a unicode string.