javascript - Convert UTF-8 data into the proper string format - Stack Overflow

If I receive a UTF-8 string via a socket (or for that matter via any external source) I would like to g

If I receive a UTF-8 string via a socket (or for that matter via any external source) I would like to get it as a properly parsed string object. The following code shows what I mean

var str='21\r\nJust a demo string \xC3\xA4\xC3\xA8-should not be anymore parsed';

// Find CRLF
var i=str.indexOf('\r\n');

// Parse size up until CRLF
var x=parseInt(str.slice(0, i));

// Read size bytes
var s=str.substr(i+2, x)

console.log(s);

This code should print

Just a demo string äè

but as the UTF-8 data is not properly parsed it only parses it up to the first Unicode character

Just a demo string ä

Would anyone have an idea how to convert this properly?

If I receive a UTF-8 string via a socket (or for that matter via any external source) I would like to get it as a properly parsed string object. The following code shows what I mean

var str='21\r\nJust a demo string \xC3\xA4\xC3\xA8-should not be anymore parsed';

// Find CRLF
var i=str.indexOf('\r\n');

// Parse size up until CRLF
var x=parseInt(str.slice(0, i));

// Read size bytes
var s=str.substr(i+2, x)

console.log(s);

This code should print

Just a demo string äè

but as the UTF-8 data is not properly parsed it only parses it up to the first Unicode character

Just a demo string ä

Would anyone have an idea how to convert this properly?

Share Improve this question asked Jul 17, 2014 at 17:24 user3847784user3847784 231 gold badge1 silver badge3 bronze badges 4
  • You may want to use Punycode, here is a library too: github./bestiejs/punycode.js – howderek Commented Jul 17, 2014 at 17:29
  • This might help: stackoverflow./questions/17057407/… – Diodeus - James MacFarlane Commented Jul 17, 2014 at 17:29
  • @howderek Thanks, but how would a punycode library help in this case? – user3847784 Commented Jul 17, 2014 at 17:30
  • nvm, I thought you were doing this over http, use this string instead: '21\r\nJust a demo string \xE4\xE8\xC3\xA8-should not be anymore parsed' you simply used the wrong escapes – howderek Commented Jul 17, 2014 at 17:36
Add a ment  | 

2 Answers 2

Reset to default 2

It seems you could use this decodeURIComponent(escape(str)):

var badstr='21\r\nJust a demo string \xC3\xA4\xC3\xA8-should not be anymore parsed';

var str=decodeURIComponent(escape(badstr));

// Find CRLF
var i=str.indexOf('\r\n');

// Parse size up until CRLF
var x=parseInt(str.slice(0, i));

// Read size bytes
var s=str.substr(i+2, x)

console.log(s);

BTW, this kind of issue occurs when you mix UTF-8 and other types of enconding. You should check that as well.

You should use utf8.js which is available on npm.

var utf8 = require('utf8');
var encoded = '21\r\nJust a demo string \xC3\xA4\xC3\xA8-foo bar baz';
var decoded = utf8.decode(encoded);
console.log(decoded);

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1744688166a4588046.html

相关推荐

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信
['keyword'] : $thread['subject']; $header['description'] = $thread['description'] ? $thread['description'] : $thread['brief']; $_SESSION['fid'] = $fid; if ($ajax) { empty($conf['api_on']) and message(0, lang('closed')); $apilist['header'] = $header; $apilist['extra'] = $extra; $apilist['access'] = $access; $apilist['thread'] = well_thread_safe_info($thread); $apilist['thread_data'] = $data; $apilist['forum'] = $forum; $apilist['imagelist'] = $imagelist; $apilist['filelist'] = $thread['filelist']; $apilist['threadlist'] = $threadlist; message(0, $apilist); } else { include _include(theme_load('single_page', $fid)); } break; default: message(-1, lang('data_malformation')); break; } ?>