我们一般提交文档常采用的是富文本编辑上传的常规方法,有时候想将文档上传后,再进行富文本编辑怎么办呢?
思路是上传文档,后端将文档解析转码,返回给前端页面,富文本编辑器接受这样的一个过程。
现在最为通用的方式就是doc和docx格式的Word文档了,markdown文档用的群体主要还是偏向于互联网,所以现在的问题就剩下一个,如何将word解析成可以在富文本编辑器的内容,很简单,先解析成html文本,再返回给前端页面。
思路有了,下面开始实现吧,首先定义一个上传的API,不同框架的方法有所不同,只要实现后端能读取到文件即可,贴出基于SpringBoo的上传接口:
@PostMapping("/upload/{menuId}/{space}")
public ResponseResult uploadDocument(@PathVariable("menuId") String menuId, @PathVariable("space") String space, @RequestParam("file") MultipartFile file) {
FileProperties properties = fileService.storeFile(file);
Document document = new Document();
String title;
String content;
String originName = properties.getFileName();
try {
if (originName.endsWith(".doc")) {
title = originName.substring(0, originName.indexOf(".doc"));
content = fileService.docToHtml(properties);
} else {
title = originName.substring(0, originName.indexOf(".docx"));
content = fileService.docxToHtml(properties);
}
} catch (Exception e) {
e.printStackTrace();
return ResponseResult.FAILED(GlobalTipMsg.DOC_UPLOAD_FAILTURE);
}
document.setTitle(title);
document.setContent(content);
return docService.addDocument(space, menuId, document);
} 需要的依赖:
<dependency>
<groupId>fr.opensagres.xdocreport</groupId>
<artifactId>xdocreport</artifactId>
<version>2.0.2</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi</artifactId>
<version>4.1.2</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml</artifactId>
<version>4.1.2</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-scratchpad</artifactId>
<version>4.1.2</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml-schemas</artifactId>
<version>4.1.2</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>ooxml-schemas</artifactId>
<version>1.4</version>
</dependency>
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.13.1</version>
</dependency> 现在需要将上传的文件先提取解析出来,再转码保存,下面我整合编写的一个工具类,可以完美的将文档提取保存到服务器备份,然后解析:
@Service
@Slf4j
public class FileService {
private final Path fileStorageLocation;
@Autowired
public FileService(FileProperties fileProperties) {
this.fileStorageLocation = Paths.get(fileProperties.getUploadDir()).toAbsolutePath().normalize();
try {
Files.createDirectories(this.fileStorageLocation);
} catch (Exception ex) {
log.error(ex.toString());
}
}
public FileProperties storeFile(MultipartFile file) {
// Normalize file name
FileProperties properties = new FileProperties();
String fileName = StringUtils.cleanPath(file.getOriginalFilename());
try {
// Check if the file's name contains invalid characters
if (fileName.contains("..")) {
log.error("Sorry! Filename contains invalid path sequence " + fileName);
}
System.out.println(this.fileStorageLocation + fileName);
// Copy file to the target location (Replacing existing file with the same name)
InputStream input = file.getInputStream();
Path targetLocation = this.fileStorageLocation.resolve(fileName);
Files.copy(input, targetLocation, StandardCopyOption.REPLACE_EXISTING);
properties.setFileDir(this.fileStorageLocation + "\\" + fileName);
properties.setUploadDir(this.fileStorageLocation.toString());
properties.setFileName(fileName);
input.close();
return properties;
} catch (IOException ex) {
log.error(ex.toString());
}
return properties;
}
public static String docxToHtml(FileProperties properties) throws Exception {
FileInputStream fileInputStream = new FileInputStream(properties.getFileDir());
XWPFDocument docxDocument = new XWPFDocument(fileInputStream);
XHTMLOptions options = XHTMLOptions.create();
//图片转base64
options.setImageManager(new Base64EmbedImgManager());
// 转换html
ByteArrayOutputStream htmlStream = new ByteArrayOutputStream();
XHTMLConverter.getInstance().convert(docxDocument, htmlStream, options);
String htmlStr = htmlStream.toString();
htmlStream.close();
fileInputStream.close();
return htmlStr;
}
public String docToHtml(FileProperties properties)throws IOException, ParserConfigurationException, TransformerException {
HWPFDocumentCore wordDocument = WordToHtmlUtils.loadDoc(new FileInputStream(properties.getFileDir()));
WordToHtmlConverter wordToHtmlConverter = new ImageConverter(
DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument()
);
wordToHtmlConverter.processDocument(wordDocument);
Document htmlDocument = wordToHtmlConverter.getDocument();
ByteArrayOutputStream out = new ByteArrayOutputStream();
DOMSource domSource = new DOMSource(htmlDocument);
StreamResult streamResult = new StreamResult(out);
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer serializer = transformerFactory.newTransformer();
serializer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
serializer.setOutputProperty(OutputKeys.INDENT, "yes");
serializer.setOutputProperty(OutputKeys.METHOD, "html");
serializer.transform(domSource, streamResult);
out.close();
String htmlStr = new String(out.toByteArray());
return htmlStr;
}
} OK!