qbits.thorn

alias

(alias chr)

Retrieves the script block alias for a unicode character.

alias+category

(alias+category chr)

Retrieves the script block alias and unicode category for a unicode character.

categories-data

category

(category chr)

Retrieves the unicode category for a unicode character.

confusables

(confusables s preferred-aliases)(confusables s)

Checks if s contains characters which might be confusable with characters from preferred-aliases.

This returns a lazy-seq so you’re free to check only the first char or the whole string.

preferred-aliases can take an set of unicode block aliases to be considered as your ‘base’ unicode blocks:

  • considering paρa,
    • with preferred-aliases #{"LATIN"}, the 3rd character ρ would be returned because this greek letter can be confused with latin p.
    • with preferred-aliases #{"greek"}, the 1st character p would be returned because this latin letter can be confused with greek ρ.
    • without a preferred-aliases, you’ll discover the 29 characters that can be confused with p, the 23 characters that look like a, and the one that looks like ρ (which is, of course, p aka LATIN SMALL LETTER P).

confusables-data

dangerous?

(dangerous? s preferred-aliases)(dangerous? s)

Checks if s can be dangerous, i.e. is it not only mixed-scripts but also contains characters from other scripts than the ones in preferred-aliases that might be confusable with characters from scripts in preferred-aliases

mixed-script?

(mixed-script? s allowed-aliases)(mixed-script? s)

Checks if s contains mixed-scripts content, excluding script blocks aliases in allowed-aliases. E.g. B. C is not considered mixed-scripts by default: it contains characters from Latin and Common, but Common is excluded by default.

unique-aliases

(unique-aliases s)

Retrieves all unique script block aliases used in a unicode string.