Problem
A character in UTF8 can be from 1 to 4 bytes long, subjected to the following rules:
For 1-byte character, the first bit is a 0, followed by its unicode code.
For n-bytes character, the first n-bits are all one"s, the n+1 bit is 0, followed by n-1 bytes with most significant 2 bits being 10.
This is how the UTF-8 encoding would work:
Char. number range (hexadecimal) | UTF-8 octet sequence (binary) |
---|---|
0000 0000-0000 007F | 0xxxxxxx |
0000 0080-0000 07FF | 110xxxxx 10xxxxxx |
0000 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx |
0001 0000-0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx |
Given an array of integers representing the data, return whether it is a valid utf-8 encoding.
Note:
The input is an array of integers. Only the least significant 8 bits of each integer is used to store the data. This means each integer represents only 1 byte of data.
Example 1:
data = [197, 130, 1], which represents the octet sequence: 11000101 10000010 00000001. Return true. It is a valid utf-8 encoding for a 2-bytes character followed by a 1-byte character.
Example 2:
data = [235, 140, 4], which represented the octet sequence: 11101011 10001100 00000100. Return false. The first 3 bits are all one"s and the 4th bit is 0 means it is a 3-bytes character. The next byte is a continuation byte which starts with 10 and that"s correct. But the second continuation byte does not start with 10, so it is invalid.Solution
class Solution { public boolean validUtf8(int[] data) { if (data == null || data.length == 0) return false; for (int i = 0; i < data.length; i++) { if (data[i] > 255) return false; int count = 0; if (data[i] < 128) { count = 1; } else if (data[i] >= 192 && data[i] < 224) { count = 2; } else if (data[i] < 240) { count = 3; } else if (data[i] < 248) { count = 4; } else { return false; } for (int j = 1; j < count; j++) { if (i+j >= data.length) return false; if (data[i+j] < 128 || data[i+j] >= 192) return false; } i = i+count-1; } return true; } }
文章版權歸作者所有,未經允許請勿轉載,若此文章存在違規行為,您可以聯系管理員刪除。
轉載請注明本文地址:http://specialneedsforspecialkids.com/yun/76740.html
摘要:題目鏈接這道題關鍵是搞懂題目意思。思路及代碼知道意思之后,這道題就很簡單了。一個,每次分三步來做,是每次都是新的統計后位里面,從前開始有多少個,用變量來保存,其中可能的值只有從開始檢查,后八位中的前兩位是否為,一共檢查更新的值為 UTF-8 Validation 題目鏈接:https://leetcode.com/problems... 這道題關鍵是搞懂題目意思。 UTF-8 1 by...
摘要:題目要求檢驗整數數組能否構成合法的編碼的序列。剩余的字節必須以開頭。而緊跟其后的字符必須格式為。綜上所述單字節多字節字符的跟隨字節兩個字節的起始字節三個字節的起始字節四個字節的起始字節下面分別是這題的兩種實現遞歸實現循環實現 題目要求 A character in UTF8 can be from 1 to 4 bytes long, subjected to the followin...
摘要:時間年月日星期三說明使用規范校驗接口請求參數源碼第一章理論簡介背景介紹如今互聯網項目都采用接口形式進行開發。該規范定義了一個元數據模型,默認的元數據來源是注解。 時間:2017年11月08日星期三說明:使用JSR303規范校驗http接口請求參數 源碼:https://github.com/zccodere/s... 第一章:理論簡介 1-1 背景介紹 如今互聯網項目都采用HTTP接口...
摘要:和上標注的約束都會被執行注意如果子類覆蓋了父類的方法,那么子類和父類的約束都會被校驗。 每篇一句 沒有任何技術方案會是一種銀彈,任何東西都是有利弊的 相關閱讀 【小家Java】深入了解數據校驗:Java Bean Validation 2.0(JSR303、JSR349、JSR380)Hibernate-Validation 6.x使用案例【小家Spring】Spring方法級別數據校...
摘要:配置的參數打開根目錄下的在最后面加上如下的參數測試環境位內存雙核測試版本經測試,啟動速度比默認配置有所提升,占用內存也較少其中這三行為啟用方式,不能保證在不同環境下都是最優配置,可以替換為多核和大內存建議使 配置eclipse的jvm參數 打開eclipse根目錄下的eclipse.ini在最后面加上如下的jvm參數 -Xms400m -Xmx1400m -XX:NewSize=128...
閱讀 729·2021-11-24 10:19
閱讀 1106·2021-09-13 10:23
閱讀 3428·2021-09-06 15:15
閱讀 1777·2019-08-30 14:09
閱讀 1684·2019-08-30 11:15
閱讀 1837·2019-08-29 18:44
閱讀 934·2019-08-29 16:34
閱讀 2456·2019-08-29 12:46