不过有时, 我们有可能会遇到非UTF-8编码的文件, 比如中文的GBK编码, 或者俄语的CP1251编码. 而文本文件一般不带有自身编码格式的信息, 这就给我们处理带来很多麻烦....首先, 我们需要看看当前系统下 enca 支持的语言和对应的编码类型: # enca --list languages belarusian: CP1251 IBM866 ISO-8859-5 KOI8...-UNI maccyr IBM855 KOI8-U bulgarian: CP1251 ISO-8859-5 IBM855 maccyr ECMA-113 czech: ISO-8859-2...baltic polish: ISO-8859-2 CP1250 IBM852 macce ISO-8859-13 ISO-8859-16 baltic CORK russian: KOI8-R CP1251...-2 IBM852 KEYBCS2 macce KOI-8_CS_2 CORK slovene: ISO-8859-2 CP1250 IBM852 macce CORK ukrainian: CP1251
CHARSETS = { "big5": "big5hkscs", "gb2312": "gb18030", "ascii": "utf-8", "maccyrillic": "cp1251...", "win1251": "cp1251", "win-1251": "cp1251", "windows-1251": "cp1251", } # 通过查表,将输入编码替换成它的超集
Unicode ISO-8859-1 – 西欧 ISO-8859-15 – 西欧(加入欧元符号 + ISO-8859-1 中丢失的法语和芬兰语字母) cp866 – DOS 专用 Cyrillic 字符集 cp1251...Unicode ISO-8859-1 – 西欧 ISO-8859-15 – 西欧(加入欧元符号 + ISO-8859-1 中丢失的法语和芬兰语字母) cp866 – DOS 专用 Cyrillic 字符集 cp1251
1、登录服务器,打开vsftp.conf文件 # vim /etc/vsftpd/vsftpd.conf 2、在文件末尾增加listen_port=8021 #remote_charset=CP1251
chinese_ci 2 binary Binary pseudo charset binary 1 cp1250 Windows Central European cp1250_general_ci 1 cp1251
| 1 | | cp1250 | Windows Central European | cp1250_general_ci | 1 | | cp1251...| 50 | | Yes | 1 | PAD SPACE | | cp1251_bulgarian_ci | cp1251 |...14 | | Yes | 1 | PAD SPACE | | cp1251_general_ci | cp1251 | 51 | Yes...| Yes | 1 | PAD SPACE | | cp1251_general_cs | cp1251 | 52 | |...Yes | 1 | PAD SPACE | | cp1251_ukrainian_ci | cp1251 | 23 | | Yes
14 Yes 1 cp1251_ukrainian_ci cp1251 23 Yes 1 cp1251_bin...cp1251 50 Yes 1 cp1251_general_ci cp1251 51 Yes Yes 1 cp1251_general_cs...cp1251 52 Yes 1 utf16_general_ci utf16 54 Yes Yes 1 utf16...| 14 | | Yes | 1 | | cp1251_ukrainian_ci | cp1251 | 23 | | Yes..._general_ci | cp1251 | 51 | Yes | Yes | 1 | | cp1251_general_cs | cp1251
Unicode ISO-8859-1 - 西欧 ISO-8859-15 - 西欧(加入欧元符号 + ISO-8859-1 中丢失的法语和芬兰语字母) cp866 - DOS 专用 Cyrillic 字符集 cp1251
soft]# 我们需要看看当前系统下 enca 支持的语言和对应的编码类型: 1 [root@slaver1 soft]# enca --list languages 2 belarusian: CP1251...IBM866 ISO-8859-5 KOI8-UNI maccyr IBM855 KOI8-U 3 bulgarian: CP1251 ISO-8859-5 IBM855 maccyr ECMA-...polish: ISO-8859-2 CP1250 IBM852 macce ISO-8859-13 ISO-8859-16 baltic CORK 11 russian: KOI8-R CP1251...IBM852 KEYBCS2 macce KOI-8_CS_2 CORK 13 slovene: ISO-8859-2 CP1250 IBM852 macce CORK 14 ukrainian: CP1251...24 CP1156, CP1157, CP1158, CP1160, CP1161, CP1162, CP1163, CP1164, CP1166, 25 CP1167, CP1250, CP1251
ISO-8859-1 - 西欧 ISO-8859-15 - 西欧(加入欧元符号 + ISO-8859-1 中丢失的法语和芬兰语字母) cp866 - DOS 专用 Cyrillic 字符集 cp1251
16 ISO_8859-16 ISO_8859-16:2001 L10 LATIN10 KOI8-R CSKOI8R KOI8-U KOI8-RU CP1250 MS-EE WINDOWS-1250 CP1251
所有这里主要说的是 Windows-1251(cp1251)编码与utf-8编码的问题,其他的如 gbk就先不考虑在内了~ 2.解决方案 1. 使用js原生编码转换 但是我现在还没找到办法哈..
克罗地亚语, 不区分大小写 cp1250_czech_cs 捷克语, 区分大小写 cp1250_general_ci 中欧 (多语言), 不区分大小写 cp1251
charset_name可以是一种 binary,armscii8, ascii,big5, cp1250,cp1251, cp1256,cp1257, cp850,cp852, cp866,cp932
general_ci 1 latin7 ISO 8859-13 Baltic latin7_general_ci 1 utf8mb4 UTF-8 Unicode utf8mb4_general_ci 4 cp1251
Standard Code for Information Interchange windows-1250 Cp1250 Windows Eastern European windows-1251 Cp1251
general_ci | 1 | | utf8mb4 | UTF-8 Unicode | utf8mb4_general_ci | 4 | | cp1251
| 1 | | cp1250 | Windows Central European | cp1250_general_ci | 1 | | cp1251
可以理解解码为以下任何编码的文本: 拉丁语-1 (ISO-8859–1) Windows-1252 (cp1252 — 用于微软产品) Windows-1251 (cp1251 — cp1252的俄语版本
领取专属 10元无门槛券
手把手带您无忧上云