编程笔记

lifelong learning & practice makes perfect

go语言string按下标访问和for range访问有什么区别

下标访问

1
2
3
4
5
func TestStr(t *testing.T) {
str := `中\a`
log.Println(unsafe.Sizeof(rune('中')), unsafe.Sizeof(str[0]), str[0], string(str[0]))
}
// 输出: 2022/09/15 19:55:27 4 1 228 ä

range

1
2
3
4
5
6
7
8
const nihongo = "日本語"
for i, v := range nihongo {
fmt.Printf("%#U starts at byte position %d\n", v, i)
}
// 输出:
// U+65E5 '日' starts at byte position 0
// U+672C '本' starts at byte position 3
// U+8A9E '語' starts at byte position 6

说明

To answer the question posed at the beginning: Strings are built from bytes so indexing them yields bytes, not characters. A string might not even hold characters. In fact, the definition of “character” is ambiguous and it would be a mistake to try to resolve the ambiguity by defining that strings are made of characters.

  • go的string是UTF-8,由一组byte表示
  • 按下标访问得到的是byte,如果不是ascall码,下标访问拿不到完整的数据
  • for range 访问得到的是rune(int32),4字节
  • UTF-8的一个”code point”,在go中占用用1~4字节,使用变长度字节对Unicode字符(的码点)进行编码。
  • Go source code is always UTF-8.
  • A string holds arbitrary bytes.
  • A string literal, absent byte-level escapes, always holds valid UTF-8 sequences.
  • Those sequences represent Unicode code points, called runes.
  • No guarantee is made in Go that characters in strings are normalized.

参考

欢迎关注我的其它发布渠道