ios - Working with Unicode code points in Swift -
if not interested in details of mongolian want quick answer using , converting unicode values in swift, skip down first part of accepted answer.
background
i want render unicode text traditional mongolian used in ios apps. better , long term solution use aat smart font render complex script. (such fonts exist license not allow modification , non-personal use.) however, since have never made font, let alone of rendering logic aat font, plan rendering myself in swift now. maybe @ later date can learn make smart font.
externally use unicode text, internally (for display in uitextview) convert unicode individual glyphs stored in dumb font (coded unicode pua values). rendering engine needs convert mongolian unicode values (range: u+1820 u+1842) glyph values stored in pua (range: u+e360 u+e5cf). anyway, plan since what did in java in past, maybe need change whole way of thinking.
example
the following image shows su written twice in mongolian using 2 different forms letter u (in red). (mongolian written vertically letters being connected cursive letters in english.)

in unicode these 2 strings expressed
var suform1: string = "\u{1830}\u{1826}" var suform2: string = "\u{1830}\u{1826}\u{180b}" the free variation selector (u+180b) in suform2 recognized (correctly) swift string unit u (u+1826) precedes it. considered swift single character, extended grapheme cluster. however, purposes of doing rendering myself, need differentiate u (u+1826) , fvs1 (u+180b) 2 distinct utf-16 code points.
for internal display purposes, convert above unicode strings following rendered glyph strings:
suform1 = "\u{e46f}\u{e3ba}" suform2 = "\u{e46f}\u{e3bb}" question
i have been playing around swift string , character. there lot of convenient things them, since in particular case deal exclusively utf-16 code units, wonder if should using old nsstring rather swift's string. realize can use string.utf16 utf-16 code points, the conversion string isn't nice.
would better stick string , character or should use nsstring , unichar?
what have read
updates question have been hidden in order clean page up. see edit history.
updated swift 3
string , character
for in future visits question, string , character answer you.
set unicode values directly in code:
var string: string = "i want visit 北京, Москва, मुंबई, القاهرة, , 서울시. 😊" var character: character = "🌍" use hexadecimal set values
var string: string = "\u{61}\u{5927}\u{1f34e}\u{3c0}" // a大🍎π var character: character = "\u{65}\u{301}" // é = "e" + accent mark see this question also.
convert unicode values:
string.utf8 string.utf16 string.unicodescalars // utf-32 string(character).utf8 string(character).utf16 string(character).unicodescalars convert unicode hex values:
let hexvalue: uint32 = 0x1f34e // convert hex value unicodescalar guard let scalarvalue = unicodescalar(hexvalue) else { // exit if hex not form valid unicode value return } // convert unicodescalar string let mystring = string(scalarvalue) // 🍎 or alternatively:
let hexvalue: uint32 = 0x1f34e if let scalarvalue = unicodescalar(hexvalue) { let mystring = string(scalarvalue) } note utf-8 , utf-16 conversion not easy. (see utf-8, utf-16, , utf-32 questions.)
nsstring , unichar
it possible work nsstring , unichar in swift, should realize unless familiar objective c , @ converting syntax swift, difficult find documentation.
also, unichar uint16 array , mentioned above conversion uint16 unicode scalar values not easy (i.e., converting surrogate pairs things emoji , other characters in upper code planes).
custom string structure
for reasons mentioned in question, ended not using of above methods. instead wrote own string structure, array of uint32 hold unicode scalar values.
again, not solution people. first consider using extensions if need extend functionality of string or character little.
but if need work exclusively unicode scalar values, write custom struct.
the advantages are:
- don't need switch between types (
string,character,unicodescalar,uint32, etc.) when doing string manipulation. - after unicode manipulation finished, final conversion
stringeasy. - easy add more methods when needed
- simplifies converting code java or other languages
disadavantages are:
- makes code less portable , less readable other swift developers
- not tested , optimized native swift types
- it yet file has included in project every time need it
you can make own, here mine reference. hardest part making hashable.
// struct array of uint32 hold unicode scalar values // version 3.4.0 (swift 3 update) struct scalarstring: sequence, hashable, customstringconvertible { fileprivate var scalararray: [uint32] = [] init() { // need go here? } init(_ character: uint32) { self.scalararray.append(character) } init(_ chararray: [uint32]) { c in chararray { self.scalararray.append(c) } } init(_ string: string) { s in string.unicodescalars { self.scalararray.append(s.value) } } // generator in order conform sequencetype protocol // (to allow users iterate in `for myscalarvalue in myscalarstring` { ... }) func makeiterator() -> anyiterator<uint32> { return anyiterator(scalararray.makeiterator()) } // append mutating func append(_ scalar: uint32) { self.scalararray.append(scalar) } mutating func append(_ scalarstring: scalarstring) { scalar in scalarstring { self.scalararray.append(scalar) } } mutating func append(_ string: string) { s in string.unicodescalars { self.scalararray.append(s.value) } } // charat func charat(_ index: int) -> uint32 { return self.scalararray[index] } // clear mutating func clear() { self.scalararray.removeall(keepingcapacity: true) } // contains func contains(_ character: uint32) -> bool { scalar in self.scalararray { if scalar == character { return true } } return false } // description (to implement printable protocol) var description: string { return self.tostring() } // endswith func endswith() -> uint32? { return self.scalararray.last } // indexof // returns first index of scalar string match func indexof(_ string: scalarstring) -> int? { if scalararray.count < string.length { return nil } in 0...(scalararray.count - string.length) { j in 0..<string.length { if string.charat(j) != scalararray[i + j] { break // substring mismatch } if j == string.length - 1 { return } } } return nil } // insert mutating func insert(_ scalar: uint32, atindex index: int) { self.scalararray.insert(scalar, at: index) } mutating func insert(_ string: scalarstring, atindex index: int) { var newindex = index scalar in string { self.scalararray.insert(scalar, at: newindex) newindex += 1 } } mutating func insert(_ string: string, atindex index: int) { var newindex = index scalar in string.unicodescalars { self.scalararray.insert(scalar.value, at: newindex) newindex += 1 } } // isempty var isempty: bool { return self.scalararray.count == 0 } // hashvalue (to implement hashable protocol) var hashvalue: int { // djb hash function return self.scalararray.reduce(5381) { ($0 << 5) &+ $0 &+ int($1) } } // length var length: int { return self.scalararray.count } // remove character mutating func removecharat(_ index: int) { self.scalararray.remove(at: index) } func removingallinstancesofchar(_ character: uint32) -> scalarstring { var returnstring = scalarstring() scalar in self.scalararray { if scalar != character { returnstring.append(scalar) } } return returnstring } func removerange(_ range: countablerange<int>) -> scalarstring? { if range.lowerbound < 0 || range.upperbound > scalararray.count { return nil } var returnstring = scalarstring() in 0..<scalararray.count { if < range.lowerbound || >= range.upperbound { returnstring.append(scalararray[i]) } } return returnstring } // replace func replace(_ character: uint32, withchar replacementchar: uint32) -> scalarstring { var returnstring = scalarstring() scalar in self.scalararray { if scalar == character { returnstring.append(replacementchar) } else { returnstring.append(scalar) } } return returnstring } func replace(_ character: uint32, withstring replacementstring: string) -> scalarstring { var returnstring = scalarstring() scalar in self.scalararray { if scalar == character { returnstring.append(replacementstring) } else { returnstring.append(scalar) } } return returnstring } func replacerange(_ range: countablerange<int>, withstring replacementstring: scalarstring) -> scalarstring { var returnstring = scalarstring() in 0..<scalararray.count { if < range.lowerbound || >= range.upperbound { returnstring.append(scalararray[i]) } else if == range.lowerbound { returnstring.append(replacementstring) } } return returnstring } // set (an alternative myscalarstring = "some string") mutating func set(_ string: string) { self.scalararray.removeall(keepingcapacity: false) s in string.unicodescalars { self.scalararray.append(s.value) } } // split func split(atchar splitchar: uint32) -> [scalarstring] { var partsarray: [scalarstring] = [] if self.scalararray.count == 0 { return partsarray } var part: scalarstring = scalarstring() scalar in self.scalararray { if scalar == splitchar { partsarray.append(part) part = scalarstring() } else { part.append(scalar) } } partsarray.append(part) return partsarray } // startswith func startswith() -> uint32? { return self.scalararray.first } // substring func substring(_ startindex: int) -> scalarstring { // startindex end of string var subarray: scalarstring = scalarstring() in startindex..<self.length { subarray.append(self.scalararray[i]) } return subarray } func substring(_ startindex: int, _ endindex: int) -> scalarstring { // (startindex inclusive, endindex exclusive) var subarray: scalarstring = scalarstring() in startindex..<endindex { subarray.append(self.scalararray[i]) } return subarray } // tostring func tostring() -> string { var string: string = "" scalar in self.scalararray { if let validscalor = unicodescalar(scalar) { string.append(character(validscalor)) } } return string } // trim // removes leading , trailing whitespace (space, tab, newline) func trim() -> scalarstring { //var returnstring = scalarstring() let space: uint32 = 0x00000020 let tab: uint32 = 0x00000009 let newline: uint32 = 0x0000000a var startindex = self.scalararray.count var endindex = 0 // leading whitespace in 0..<self.scalararray.count { if self.scalararray[i] != space && self.scalararray[i] != tab && self.scalararray[i] != newline { startindex = break } } // trailing whitespace in stride(from: (self.scalararray.count - 1), through: 0, by: -1) { if self.scalararray[i] != space && self.scalararray[i] != tab && self.scalararray[i] != newline { endindex = + 1 break } } if endindex <= startindex { return scalarstring() } return self.substring(startindex, endindex) } // values func values() -> [uint32] { return self.scalararray } } func ==(left: scalarstring, right: scalarstring) -> bool { return left.scalararray == right.scalararray } func +(left: scalarstring, right: scalarstring) -> scalarstring { var returnstring = scalarstring() scalar in left.values() { returnstring.append(scalar) } scalar in right.values() { returnstring.append(scalar) } return returnstring }
Comments
Post a Comment