Issue
I have a string "\ufffd\ufffd hello\n"
i have a code like this
fun main() {
val bs = "\ufffd\ufffd hello\n"
println(bs) // �� hello
}
and i want to see "\ufffd\ufffd hello"
, how can i escape \u for every hex values
UPD:
val s = """\uffcd"""
val req = """(?<!\\\\)(\\\\\\\\)*(\\u)([A-Fa-f\\d]{4})""".toRegex()
return s.replace(unicodeRegex, """$1\\\\u$3""")
Solution
(I'm interpreting the question as asking how to clearly display a string that contains non-printable characters. The Kotlin compiler converts sequences of a \u
followed by 4 hex digits in string literals into single characters, so the question is effectively asking how to convert them back again.)
Unfortunately, there's no built-in way of doing this. It's fairly easy to write one, but it's a bit subjective, as there's no single definition of what's ‘printable‘…
Here's an extension function that probably does roughly what you want:
fun String.printable() = map {
when (Character.getType(it).toByte()) {
Character.CONTROL, Character.FORMAT, Character.PRIVATE_USE,
Character.SURROGATE, Character.UNASSIGNED, Character.OTHER_SYMBOL
-> "\\u%04x".format(it.toInt())
else -> it.toString()
}
}.joinToString("")
println("\ufffd\ufffd hello\n".printable()) // prints ‘\ufffd\ufffd hello\u000a’
The sample string in the question is a bad example, because \uFFFD
is the replacement character — a black diamond with a question mark, usually shown in place of any non-displayable characters. So the replacement character itself is displayable!
The code above treats it as non-displayable by excluding the Character.OTHER_SYMBOL
type — but that will also exclude many other symbols. So you'll probably want to remove it, leaving just the other 5 types. (I got those from this answer.)
Because the trailing newline is non-displayable, that gets converted to a hex code too. You could extend the code to handle the escape codes \t
, \b
, \n
, \r
and maybe \\
too if needed. (You could also make it more efficient… this was done for brevity!)
Answered By - gidds
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.