qbits.thorn
alias+category
(alias+category chr)
Retrieves the script block alias and unicode category for a unicode character.
confusables
(confusables s preferred-aliases)
(confusables s)
Checks if s
contains characters which might be confusable with characters from preferred-aliases
.
This returns a lazy-seq so you’re free to check only the first char or the whole string.
preferred-aliases
can take an set of unicode block aliases to be considered as your ‘base’ unicode blocks:
- considering
paρa
,- with
preferred-aliases
#{"LATIN"}
, the 3rd characterρ
would be returned because this greek letter can be confused with latinp
. - with
preferred-aliases
#{"greek"}
, the 1st characterp
would be returned because this latin letter can be confused with greekρ
. - without a
preferred-aliases
, you’ll discover the 29 characters that can be confused withp
, the 23 characters that look likea
, and the one that looks likeρ
(which is, of course, p aka LATIN SMALL LETTER P).
- with
dangerous?
(dangerous? s preferred-aliases)
(dangerous? s)
Checks if s
can be dangerous, i.e. is it not only mixed-scripts but also contains characters from other scripts than the ones in preferred-aliases
that might be confusable with characters from scripts in preferred-aliases
mixed-script?
(mixed-script? s allowed-aliases)
(mixed-script? s)
Checks if s
contains mixed-scripts content, excluding script blocks aliases in allowed-aliases
. E.g. B. C
is not considered mixed-scripts by default: it contains characters from Latin and Common, but Common is excluded by default.
unique-aliases
(unique-aliases s)
Retrieves all unique script block aliases used in a unicode string.