Convert PowerPoint to PDF and PDF to Images using Java

Convert PowerPoint to PDF and PDF to Images using Java

Introduction

In this blog post, we'll walk through how to convert a PowerPoint presentation (PPTX) to a PDF and then convert that PDF into a series of images using Java. We'll utilize the Apache POI library for reading PPTX files, iText for creating PDF files, and PDFBox for rendering PDF files to images. Let's get started!

Prerequisites

Before we begin, make sure you have the following set up:

  1. Java Development Kit (JDK) installed.

  2. Maven for dependency management.

  3. An IDE such as IntelliJ IDEA or Eclipse.

Maven Dependencies

First, let's define our project dependencies in the pom.xml file. Add the following dependencies to your Maven project:

<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.example</groupId>
    <artifactId>ppt-to-pdf</artifactId>
    <version>1.0-SNAPSHOT</version>

    <properties>
        <maven.compiler.source>1.8</maven.compiler.source>
        <maven.compiler.target>1.8</maven.compiler.target>
    </properties>

    <dependencies>
        <!-- Apache POI for reading PPT files -->
        <dependency>
            <groupId>org.apache.poi</groupId>
            <artifactId>poi</artifactId>
            <version>5.2.3</version>
        </dependency>
        <dependency>
            <groupId>org.apache.poi</groupId>
            <artifactId>poi-ooxml</artifactId>
            <version>5.2.3</version>
        </dependency>

        <!-- iText for creating PDF files -->
        <dependency>
            <groupId>com.itextpdf</groupId>
            <artifactId>itextpdf</artifactId>
            <version>5.5.13.3</version>
        </dependency>

        <!-- PDFBox for rendering PDF files -->
        <dependency>
            <groupId>org.apache.pdfbox</groupId>
            <artifactId>pdfbox</artifactId>
            <version>2.0.27</version>
        </dependency>
        <dependency>
            <groupId>org.apache.pdfbox</groupId>
            <artifactId>pdfbox-tools</artifactId>
            <version>2.0.27</version>
        </dependency>
        <dependency>
            <groupId>org.apache.pdfbox</groupId>
            <artifactId>pdfbox-app</artifactId>
            <version>2.0.27</version>
        </dependency>

        <!-- Log4j2 dependencies -->
        <dependency>
            <groupId>org.apache.logging.log4j</groupId>
            <artifactId>log4j-core</artifactId>
            <version>2.20.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.logging.log4j</groupId>
            <artifactId>log4j-api</artifactId>
            <version>2.20.0</version>
        </dependency>
    </dependencies>
</project>

Code Explanation

  1. Convert PPTX to PDF

  1. We'll start by converting the PowerPoint presentation to a PDF.
import org.apache.poi.xslf.usermodel.XMLSlideShow;
import org.apache.poi.xslf.usermodel.XSLFSlide;
import com.itextpdf.text.Document;
import com.itextpdf.text.Image;
import com.itextpdf.text.Rectangle;
import com.itextpdf.text.pdf.PdfWriter;

import javax.imageio.ImageIO;
import java.awt.Graphics2D;
import java.awt.geom.Rectangle2D;
import java.awt.image.BufferedImage;
import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;

public class Main {
    public static void main(String[] args) throws IOException {
        String pptxFileName = "Sample.pptx";
        String pdfFileName = "presentation.pdf";

        // Convert PPTX to PDF
        try {
            FileInputStream inputStream = new FileInputStream(new File(pptxFileName));
            XMLSlideShow ppt = new XMLSlideShow(inputStream);

            Document pdfDocument = new Document(new Rectangle((float) ppt.getPageSize().getWidth(), (float) ppt.getPageSize().getHeight()));
            PdfWriter writer = PdfWriter.getInstance(pdfDocument, new FileOutputStream(pdfFileName));
            pdfDocument.open();

            for (XSLFSlide slide : ppt.getSlides()) {
                BufferedImage img = new BufferedImage((int) ppt.getPageSize().getWidth(), (int) ppt.getPageSize().getHeight(), BufferedImage.TYPE_INT_RGB);
                Graphics2D graphics = img.createGraphics();
                graphics.setRenderingHint(java.awt.RenderingHints.KEY_ANTIALIASING, java.awt.RenderingHints.VALUE_ANTIALIAS_ON);
                graphics.setRenderingHint(java.awt.RenderingHints.KEY_RENDERING, java.awt.RenderingHints.VALUE_RENDER_QUALITY);
                graphics.setRenderingHint(java.awt.RenderingHints.KEY_INTERPOLATION, java.awt.RenderingHints.VALUE_INTERPOLATION_BICUBIC);

                // Set background color
                graphics.setColor(java.awt.Color.WHITE);
                graphics.fill(new Rectangle2D.Float(0, 0, img.getWidth(), img.getHeight()));

                // Draw slide content on the graphics
                slide.draw(graphics);

                // Convert graphics to image
                ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
                ImageIO.write(img, "png", byteArrayOutputStream);
                byte[] bytes = byteArrayOutputStream.toByteArray();

                // Add image to PDF document
                Image pdfImage = Image.getInstance(bytes);
                float pdfWidth = pdfDocument.getPageSize().getWidth();
                float pdfHeight = pdfDocument.getPageSize().getHeight();
                float imgWidth = pdfImage.getWidth();
                float imgHeight = pdfImage.getHeight();

                float scaleX = pdfWidth / imgWidth;
                float scaleY = pdfHeight / imgHeight;
                float scale = Math.min(scaleX, scaleY);

                pdfImage.scaleAbsolute(imgWidth * scale, imgHeight * scale);
                pdfImage.setAbsolutePosition(
                        (pdfWidth - imgWidth * scale) / 2,
                        (pdfHeight - imgHeight * scale) / 2
                );

                pdfDocument.add(pdfImage);
                pdfDocument.newPage(); // Start a new page for the next slide

                graphics.dispose();
            }

            pdfDocument.close();
            System.out.println("PPTX converted to PDF successfully!");

        } catch (Exception e) {
            e.printStackTrace();
        }

        // Convert PDF to images
        try (PDDocument document = PDDocument.load(new File(pdfFileName))) {
            PDFRenderer pdfRenderer = new PDFRenderer(document);

            for (int page = 0; page < document.getNumberOfPages(); ++page) {
                BufferedImage bim = pdfRenderer.renderImageWithDPI(page, 300);
                String outputFileName = pdfFileName.replace(".pdf", "") + "-" + (page + 1) + ".png";
                ImageIOUtil.writeImage(bim, outputFileName, 300);
            }

            System.out.println("PDF converted to images successfully!");
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

Step-by-Step Explanation

Step 1: Reading the PPTX File

We start by reading the PPTX file into an XMLSlideShow object.

FileInputStream inputStream = new FileInputStream(new File(pptxFileName));
XMLSlideShow ppt = new XMLSlideShow(inputStream);

Step 2 : Creating a PDF document

Next, we create a new PDF document with the same dimensions as the PPTX slides.

Document pdfDocument = new Document(new Rectangle((float) ppt.getPageSize().getWidth(), (float) ppt.getPageSize().getHeight()));
PdfWriter writer = PdfWriter.getInstance(pdfDocument, new FileOutputStream(pdfFileName));
pdfDocument.open();

Step 3 : Processing each slide

For each slide in the PPTX file, we create a BufferedImage, draw the slide content onto this image, convert it to a byte array, and add it to the PDF document.

for (XSLFSlide slide : ppt.getSlides()) {
    BufferedImage img = new BufferedImage((int) ppt.getPageSize().getWidth(), (int) ppt.getPageSize().getHeight(), BufferedImage.TYPE_INT_RGB);
    Graphics2D graphics = img.createGraphics();
    graphics.setRenderingHint(java.awt.RenderingHints.KEY_ANTIALIASING, java.awt.RenderingHints.VALUE_ANTIALIAS_ON);
    graphics.setRenderingHint(java.awt.RenderingHints.KEY_RENDERING, java.awt.RenderingHints.VALUE_RENDER_QUALITY);
    graphics.setRenderingHint(java.awt.RenderingHints.KEY_INTERPOLATION, java.awt.RenderingHints.VALUE_INTERPOLATION_BICUBIC);

    // Set background color
    graphics.setColor(java.awt.Color.WHITE);
    graphics.fill(new Rectangle2D.Float(0, 0, img.getWidth(), img.getHeight()));

    // Draw slide content on the graphics
    slide.draw(graphics);

    // Convert graphics to image
    ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
    ImageIO.write(img, "png", byteArrayOutputStream);
    byte[] bytes = byteArrayOutputStream.toByteArray();

    // Add image to PDF document
    Image pdfImage = Image.getInstance(bytes);
    float pdfWidth = pdfDocument.getPageSize().getWidth();
    float pdfHeight = pdfDocument.getPageSize().getHeight();
    float imgWidth = pdfImage.getWidth();
    float imgHeight = pdfImage.getHeight();

    float scaleX = pdfWidth / imgWidth;
    float scaleY = pdfHeight / imgHeight;
    float scale = Math.min(scaleX, scaleY);

    pdfImage.scaleAbsolute(imgWidth * scale, imgHeight * scale);
    pdfImage.setAbsolutePosition(
            (pdfWidth - imgWidth * scale) / 2,
            (pdfHeight - imgHeight * scale) / 2
    );

    pdfDocument.add(pdfImage);
    pdfDocument.newPage(); // Start a new page for the next slide

    graphics.dispose();
}

Step 4: Closing the PDF Document

Finally, we close the PDF document to complete the conversion

pdfDocument.close();
  1. Convert PDF to Images

    Next, we'll convert the PDF to images using PDFBox.

    Step 1: Loading the PDF File

    We start by loading the PDF document into a PDDocument object.

try (PDDocument document = PDDocument.load(new File(pdfFileName))) {
    PDFRenderer pdfRenderer = new PDFRenderer(document);

Step 2: Rendering Each Page to an Image

For each page in the PDF document, we render it to a BufferedImage with 300 DPI resolution and save it as a PNG file.

for (int page = 0; page < document.getNumberOfPages(); ++page) {
    BufferedImage bim = pdfRenderer.renderImageWithDPI(page, 300);
    String outputFileName = pdfFileName.replace(".pdf", "") + "-" + (page + 1) + ".png";
    ImageIOUtil.writeImage(bim, outputFileName, 300);
}

Conclusion

In this blog post, we demonstrated how to convert a PowerPoint presentation to a PDF and then convert that PDF into images using Java. By leveraging Apache POI, iText, and PDFBox libraries, we can efficiently handle these conversions in our Java applications. I hope you found this guide helpful and that it inspires you to explore further possibilities with these powerful libraries.

Feel free to leave comments or questions below!