ios - Working with Unicode code points in Swift -


if not interested in details of mongolian want quick answer using , converting unicode values in swift, skip down first part of accepted answer.


background

i want render unicode text traditional mongolian used in ios apps. better , long term solution use aat smart font render complex script. (such fonts exist license not allow modification , non-personal use.) however, since have never made font, let alone of rendering logic aat font, plan rendering myself in swift now. maybe @ later date can learn make smart font.

externally use unicode text, internally (for display in uitextview) convert unicode individual glyphs stored in dumb font (coded unicode pua values). rendering engine needs convert mongolian unicode values (range: u+1820 u+1842) glyph values stored in pua (range: u+e360 u+e5cf). anyway, plan since what did in java in past, maybe need change whole way of thinking.

example

the following image shows su written twice in mongolian using 2 different forms letter u (in red). (mongolian written vertically letters being connected cursive letters in english.)

enter image description here

in unicode these 2 strings expressed

var suform1: string = "\u{1830}\u{1826}" var suform2: string = "\u{1830}\u{1826}\u{180b}" 

the free variation selector (u+180b) in suform2 recognized (correctly) swift string unit u (u+1826) precedes it. considered swift single character, extended grapheme cluster. however, purposes of doing rendering myself, need differentiate u (u+1826) , fvs1 (u+180b) 2 distinct utf-16 code points.

for internal display purposes, convert above unicode strings following rendered glyph strings:

suform1 = "\u{e46f}\u{e3ba}"  suform2 = "\u{e46f}\u{e3bb}" 

question

i have been playing around swift string , character. there lot of convenient things them, since in particular case deal exclusively utf-16 code units, wonder if should using old nsstring rather swift's string. realize can use string.utf16 utf-16 code points, the conversion string isn't nice.

would better stick string , character or should use nsstring , unichar?

what have read

updates question have been hidden in order clean page up. see edit history.

updated swift 3

string , character

for in future visits question, string , character answer you.

set unicode values directly in code:

var string: string = "i want visit 北京, Москва, मुंबई, القاهرة, , 서울시. 😊" var character: character = "🌍" 

use hexadecimal set values

var string: string = "\u{61}\u{5927}\u{1f34e}\u{3c0}" // a大🍎π var character: character = "\u{65}\u{301}" // é = "e" + accent mark 

see this question also.

convert unicode values:

string.utf8 string.utf16 string.unicodescalars // utf-32  string(character).utf8 string(character).utf16 string(character).unicodescalars 

convert unicode hex values:

let hexvalue: uint32 = 0x1f34e  // convert hex value unicodescalar guard let scalarvalue = unicodescalar(hexvalue) else {     // exit if hex not form valid unicode value     return }  // convert unicodescalar string let mystring = string(scalarvalue) // 🍎 

or alternatively:

let hexvalue: uint32 = 0x1f34e if let scalarvalue = unicodescalar(hexvalue) {     let mystring = string(scalarvalue) } 

note utf-8 , utf-16 conversion not easy. (see utf-8, utf-16, , utf-32 questions.)

nsstring , unichar

it possible work nsstring , unichar in swift, should realize unless familiar objective c , @ converting syntax swift, difficult find documentation.

also, unichar uint16 array , mentioned above conversion uint16 unicode scalar values not easy (i.e., converting surrogate pairs things emoji , other characters in upper code planes).

custom string structure

for reasons mentioned in question, ended not using of above methods. instead wrote own string structure, array of uint32 hold unicode scalar values.

again, not solution people. first consider using extensions if need extend functionality of string or character little.

but if need work exclusively unicode scalar values, write custom struct.

the advantages are:

  • don't need switch between types (string, character, unicodescalar, uint32, etc.) when doing string manipulation.
  • after unicode manipulation finished, final conversion string easy.
  • easy add more methods when needed
  • simplifies converting code java or other languages

disadavantages are:

  • makes code less portable , less readable other swift developers
  • not tested , optimized native swift types
  • it yet file has included in project every time need it

you can make own, here mine reference. hardest part making hashable.

// struct array of uint32 hold unicode scalar values // version 3.4.0 (swift 3 update)   struct scalarstring: sequence, hashable, customstringconvertible {      fileprivate var scalararray: [uint32] = []       init() {         // need go here?     }      init(_ character: uint32) {         self.scalararray.append(character)     }      init(_ chararray: [uint32]) {         c in chararray {             self.scalararray.append(c)         }     }      init(_ string: string) {          s in string.unicodescalars {             self.scalararray.append(s.value)         }     }      // generator in order conform sequencetype protocol     // (to allow users iterate in `for myscalarvalue in myscalarstring` { ... })     func makeiterator() -> anyiterator<uint32> {         return anyiterator(scalararray.makeiterator())     }      // append     mutating func append(_ scalar: uint32) {         self.scalararray.append(scalar)     }      mutating func append(_ scalarstring: scalarstring) {         scalar in scalarstring {             self.scalararray.append(scalar)         }     }      mutating func append(_ string: string) {         s in string.unicodescalars {             self.scalararray.append(s.value)         }     }      // charat     func charat(_ index: int) -> uint32 {         return self.scalararray[index]     }      // clear     mutating func clear() {         self.scalararray.removeall(keepingcapacity: true)     }      // contains     func contains(_ character: uint32) -> bool {         scalar in self.scalararray {             if scalar == character {                 return true             }         }         return false     }      // description (to implement printable protocol)     var description: string {         return self.tostring()     }      // endswith     func endswith() -> uint32? {         return self.scalararray.last     }      // indexof     // returns first index of scalar string match     func indexof(_ string: scalarstring) -> int? {          if scalararray.count < string.length {             return nil         }          in 0...(scalararray.count - string.length) {              j in 0..<string.length {                  if string.charat(j) != scalararray[i + j] {                     break // substring mismatch                 }                 if j == string.length - 1 {                     return                 }             }         }          return nil     }      // insert     mutating func insert(_ scalar: uint32, atindex index: int) {         self.scalararray.insert(scalar, at: index)     }     mutating func insert(_ string: scalarstring, atindex index: int) {         var newindex = index         scalar in string {             self.scalararray.insert(scalar, at: newindex)             newindex += 1         }     }     mutating func insert(_ string: string, atindex index: int) {         var newindex = index         scalar in string.unicodescalars {             self.scalararray.insert(scalar.value, at: newindex)             newindex += 1         }     }      // isempty     var isempty: bool {         return self.scalararray.count == 0     }      // hashvalue (to implement hashable protocol)     var hashvalue: int {          // djb hash function         return self.scalararray.reduce(5381) {             ($0 << 5) &+ $0 &+ int($1)         }     }      // length     var length: int {         return self.scalararray.count     }      // remove character     mutating func removecharat(_ index: int) {         self.scalararray.remove(at: index)     }     func removingallinstancesofchar(_ character: uint32) -> scalarstring {          var returnstring = scalarstring()          scalar in self.scalararray {             if scalar != character {                 returnstring.append(scalar)             }         }          return returnstring     }     func removerange(_ range: countablerange<int>) -> scalarstring? {          if range.lowerbound < 0 || range.upperbound > scalararray.count {             return nil         }          var returnstring = scalarstring()          in 0..<scalararray.count {             if < range.lowerbound || >= range.upperbound {                 returnstring.append(scalararray[i])             }         }          return returnstring     }       // replace     func replace(_ character: uint32, withchar replacementchar: uint32) -> scalarstring {          var returnstring = scalarstring()          scalar in self.scalararray {             if scalar == character {                 returnstring.append(replacementchar)             } else {                 returnstring.append(scalar)             }         }         return returnstring     }     func replace(_ character: uint32, withstring replacementstring: string) -> scalarstring {          var returnstring = scalarstring()          scalar in self.scalararray {             if scalar == character {                 returnstring.append(replacementstring)             } else {                 returnstring.append(scalar)             }         }         return returnstring     }     func replacerange(_ range: countablerange<int>, withstring replacementstring: scalarstring) -> scalarstring {          var returnstring = scalarstring()          in 0..<scalararray.count {             if < range.lowerbound || >= range.upperbound {                 returnstring.append(scalararray[i])             } else if == range.lowerbound {                 returnstring.append(replacementstring)             }         }         return returnstring     }      // set (an alternative myscalarstring = "some string")     mutating func set(_ string: string) {         self.scalararray.removeall(keepingcapacity: false)         s in string.unicodescalars {             self.scalararray.append(s.value)         }     }      // split     func split(atchar splitchar: uint32) -> [scalarstring] {         var partsarray: [scalarstring] = []         if self.scalararray.count == 0 {             return partsarray         }         var part: scalarstring = scalarstring()         scalar in self.scalararray {             if scalar == splitchar {                 partsarray.append(part)                 part = scalarstring()             } else {                 part.append(scalar)             }         }         partsarray.append(part)         return partsarray     }      // startswith     func startswith() -> uint32? {         return self.scalararray.first     }      // substring     func substring(_ startindex: int) -> scalarstring {         // startindex end of string         var subarray: scalarstring = scalarstring()         in startindex..<self.length {             subarray.append(self.scalararray[i])         }         return subarray     }     func substring(_ startindex: int, _ endindex: int) -> scalarstring {         // (startindex inclusive, endindex exclusive)         var subarray: scalarstring = scalarstring()         in startindex..<endindex {             subarray.append(self.scalararray[i])         }         return subarray     }      // tostring     func tostring() -> string {         var string: string = ""          scalar in self.scalararray {             if let validscalor = unicodescalar(scalar) {                 string.append(character(validscalor))             }         }         return string     }      // trim     // removes leading , trailing whitespace (space, tab, newline)     func trim() -> scalarstring {          //var returnstring = scalarstring()         let space: uint32 = 0x00000020         let tab: uint32 = 0x00000009         let newline: uint32 = 0x0000000a          var startindex = self.scalararray.count         var endindex = 0          // leading whitespace         in 0..<self.scalararray.count {             if self.scalararray[i] != space &&                 self.scalararray[i] != tab &&                 self.scalararray[i] != newline {                  startindex =                 break             }         }          // trailing whitespace         in stride(from: (self.scalararray.count - 1), through: 0, by: -1) {             if self.scalararray[i] != space &&                 self.scalararray[i] != tab &&                 self.scalararray[i] != newline {                  endindex = + 1                 break             }         }          if endindex <= startindex {             return scalarstring()         }          return self.substring(startindex, endindex)     }      // values     func values() -> [uint32] {         return self.scalararray     }  }  func ==(left: scalarstring, right: scalarstring) -> bool {     return left.scalararray == right.scalararray }  func +(left: scalarstring, right: scalarstring) -> scalarstring {     var returnstring = scalarstring()     scalar in left.values() {         returnstring.append(scalar)     }     scalar in right.values() {         returnstring.append(scalar)     }     return returnstring } 

Comments

Popular posts from this blog

toolbar - How to add link to user registration inside toobar in admin joomla 3 custom component -

linux - disk space limitation when creating war file -