首页
学习
活动
专区
圈层
工具
发布
社区首页 >专栏 >基于POI的Word解析成HTML(base64图片)

基于POI的Word解析成HTML(base64图片)

作者头像
CodeWwang
发布2022-08-24 10:43:44
发布2022-08-24 10:43:44
2K0
举报
文章被收录于专栏:CodeWwangCodeWwang

我们一般提交文档常采用的是富文本编辑上传的常规方法,有时候想将文档上传后,再进行富文本编辑怎么办呢?

思路是上传文档,后端将文档解析转码,返回给前端页面,富文本编辑器接受这样的一个过程。

现在最为通用的方式就是doc和docx格式的Word文档了,markdown文档用的群体主要还是偏向于互联网,所以现在的问题就剩下一个,如何将word解析成可以在富文本编辑器的内容,很简单,先解析成html文本,再返回给前端页面。

实现

思路有了,下面开始实现吧,首先定义一个上传的API,不同框架的方法有所不同,只要实现后端能读取到文件即可,贴出基于SpringBoo的上传接口:

代码语言:javascript
复制
 @PostMapping("/upload/{menuId}/{space}") 
 public ResponseResult uploadDocument(@PathVariable("menuId") String menuId, @PathVariable("space") String space, @RequestParam("file") MultipartFile file) { 
 FileProperties properties = fileService.storeFile(file); 
 Document document = new Document(); 
 String title; 
 String content; 
 String originName = properties.getFileName(); 
 try { 
 if (originName.endsWith(".doc")) { 
                title = originName.substring(0, originName.indexOf(".doc")); 
                content = fileService.docToHtml(properties); 
 } else { 
                title = originName.substring(0, originName.indexOf(".docx")); 
                content = fileService.docxToHtml(properties); 
 } 
 } catch (Exception e) { 
            e.printStackTrace(); 
 return ResponseResult.FAILED(GlobalTipMsg.DOC_UPLOAD_FAILTURE); 
 } 
        document.setTitle(title); 
        document.setContent(content); 
 return docService.addDocument(space, menuId, document); 
 } 

需要的依赖:

代码语言:javascript
复制
 <dependency> 
 <groupId>fr.opensagres.xdocreport</groupId> 
 <artifactId>xdocreport</artifactId> 
 <version>2.0.2</version> 
 </dependency> 
 <dependency> 
 <groupId>org.apache.poi</groupId> 
 <artifactId>poi</artifactId> 
 <version>4.1.2</version> 
 </dependency> 
 <dependency> 
 <groupId>org.apache.poi</groupId> 
 <artifactId>poi-ooxml</artifactId> 
 <version>4.1.2</version> 
 </dependency> 
 <dependency> 
 <groupId>org.apache.poi</groupId> 
 <artifactId>poi-scratchpad</artifactId> 
 <version>4.1.2</version> 
 </dependency> 
 <dependency> 
 <groupId>org.apache.poi</groupId> 
 <artifactId>poi-ooxml-schemas</artifactId> 
 <version>4.1.2</version> 
 </dependency> 
 <dependency> 
 <groupId>org.apache.poi</groupId> 
 <artifactId>ooxml-schemas</artifactId> 
 <version>1.4</version> 
 </dependency> 
 <dependency> 
 <groupId>org.jsoup</groupId> 
 <artifactId>jsoup</artifactId> 
 <version>1.13.1</version> 
 </dependency> 

现在需要将上传的文件先提取解析出来,再转码保存,下面我整合编写的一个工具类,可以完美的将文档提取保存到服务器备份,然后解析:

代码语言:javascript
复制
@Service 
@Slf4j 
public class FileService { 
 private final Path fileStorageLocation; 
 @Autowired 
 public FileService(FileProperties fileProperties) { 
 this.fileStorageLocation = Paths.get(fileProperties.getUploadDir()).toAbsolutePath().normalize(); 
 try { 
 Files.createDirectories(this.fileStorageLocation); 
 } catch (Exception ex) { 
            log.error(ex.toString()); 
 } 
 } 
 public FileProperties storeFile(MultipartFile file) { 
 // Normalize file name 
 FileProperties properties = new FileProperties(); 
 String fileName = StringUtils.cleanPath(file.getOriginalFilename()); 
 try { 
 // Check if the file's name contains invalid characters 
 if (fileName.contains("..")) { 
                log.error("Sorry! Filename contains invalid path sequence " + fileName); 
 } 
 System.out.println(this.fileStorageLocation + fileName); 
 // Copy file to the target location (Replacing existing file with the same name) 
 InputStream input = file.getInputStream(); 
 Path targetLocation = this.fileStorageLocation.resolve(fileName); 
 Files.copy(input, targetLocation, StandardCopyOption.REPLACE_EXISTING); 
            properties.setFileDir(this.fileStorageLocation + "\\" + fileName); 
            properties.setUploadDir(this.fileStorageLocation.toString()); 
            properties.setFileName(fileName); 
            input.close(); 
 return properties; 
 } catch (IOException ex) { 
            log.error(ex.toString()); 
 } 
 return properties; 
 } 
 public static String docxToHtml(FileProperties properties) throws Exception { 
 FileInputStream fileInputStream = new FileInputStream(properties.getFileDir()); 
 XWPFDocument docxDocument = new XWPFDocument(fileInputStream); 
 XHTMLOptions options = XHTMLOptions.create(); 
 //图片转base64 
        options.setImageManager(new Base64EmbedImgManager()); 
 // 转换html 
 ByteArrayOutputStream htmlStream = new ByteArrayOutputStream(); 
 XHTMLConverter.getInstance().convert(docxDocument, htmlStream, options); 
 String htmlStr = htmlStream.toString(); 
        htmlStream.close(); 
        fileInputStream.close(); 
 return htmlStr; 
 } 
 public String docToHtml(FileProperties properties)throws IOException, ParserConfigurationException, TransformerException { 
 HWPFDocumentCore wordDocument = WordToHtmlUtils.loadDoc(new FileInputStream(properties.getFileDir())); 
 WordToHtmlConverter wordToHtmlConverter = new ImageConverter( 
 DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument() 
 ); 
        wordToHtmlConverter.processDocument(wordDocument); 
 Document htmlDocument = wordToHtmlConverter.getDocument(); 
 ByteArrayOutputStream out = new ByteArrayOutputStream(); 
 DOMSource domSource = new DOMSource(htmlDocument); 
 StreamResult streamResult = new StreamResult(out); 
 TransformerFactory transformerFactory = TransformerFactory.newInstance(); 
 Transformer serializer = transformerFactory.newTransformer(); 
        serializer.setOutputProperty(OutputKeys.ENCODING, "UTF-8"); 
        serializer.setOutputProperty(OutputKeys.INDENT, "yes"); 
        serializer.setOutputProperty(OutputKeys.METHOD, "html"); 
        serializer.transform(domSource, streamResult); 
        out.close(); 
 String htmlStr = new String(out.toByteArray()); 
 return htmlStr; 
 } 
} 

OK!

本文参与 腾讯云自媒体同步曝光计划,分享自作者个人站点/博客。
原始发表:2020/12/28 ,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 实现
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档