我正在尝试使用netbeans在java中读取泰米尔语文本文件。我得到的输出作为简单的空白小方框。我的重点是我需要阅读泰米尔语文本文件,每个句子都需要拆分为单词。下面给出了代码,请检查并给我如何获得它的建议。
package javaapplication6;
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.io.InputStreamReader;
import java.lang.Character.UnicodeBlock;
class words {
private static String[] words;
private static String[] word;
public boolean isTamil(String text){
boolean result = true;
UnicodeBlock tamilBlock = UnicodeBlock.forName("TAMIL");
for(int i=0; i<text.length(); i++){
UnicodeBlock charBlock = UnicodeBlock.of(text.charAt(i));
if(!tamilBlock.equals(charBlock)){
result = false;
}
}
return result;
}
public static void split (String[] query, String[] words) throws IOException {
String s = "This is a sample sentence.";
String[] word = s.split("\\s+");
for (int i = 0; i < words.length; i++) {
// You may want to check for a non-word character before blindly
// performing a replacement
// It may also be necessary to adjust the character class
word[i] = word[i].replaceAll("", "");
}
}
public static void main(String[] args) throws FileNotFoundException, IOException {
// TODO code application logic hereString fileName="W:/head.txt";
FileInputStream fstream = new FileInputStream("W:/first.txt");
BufferedReader br = new BufferedReader(new InputStreamReader(fstream));
String strLine;
//Read File Line By Line
while ((strLine = br.readLine()) != null) {
// Print the content on the console
split(word,words);
System.out.println (strLine);
}
br.close();
}
}
发布于 2014-12-22 04:35:17
这是因为这是一个字符编码问题。像IDE这样的Netbeans将使用默认的OS编码,因此它将打印方框或其他funcky字符。
可能解决方案是将字符编码设置为UTF-8
打开项目窗格,如果你还没有看到它(窗口>项目),在树状视图中右键点击你的项目名称,点击属性确保左侧的菜单项"sources“被高亮显示,你应该看到"Encoding:”和它旁边的一个选择框。单击“确定”。
好了。
https://stackoverflow.com/questions/21720504
复制相似问题