PDF转换为黑色和白色的PNG转换为、白色、黑色、PDF

2023-09-02 02:05:38 作者:巷口酒肆

我试图用iTextSharp的COM preSS PDF文件。也有很多具有将JPEG(DCTDE code)保存彩色图像的网页...所以我把它们转换为黑色和白色PNG图像和更换他们的文档(PNG比JPG黑色小很多白色格式)

I'm trying to compress PDFs using iTextSharp. There are a lot of pages with color images stored as JPEGs (DCTDECODE)...so I'm converting them to black and white PNGs and replacing them in the document (the PNG is much smaller than a JPG for black and white format)

我有以下几种方法:

    private static bool TryCompressPdfImages(PdfReader reader)
    {
        try
        {
            int n = reader.XrefSize;
            for (int i = 0; i < n; i++)
            {
                PdfObject obj = reader.GetPdfObject(i);
                if (obj == null || !obj.IsStream())
                {
                    continue;
                }

                var dict = (PdfDictionary)PdfReader.GetPdfObject(obj);
                var subType = (PdfName)PdfReader.GetPdfObject(dict.Get(PdfName.SUBTYPE));
                if (!PdfName.IMAGE.Equals(subType))
                {
                    continue;
                }

                var stream = (PRStream)obj;
                try
                {
                    var image = new PdfImageObject(stream);

                    Image img = image.GetDrawingImage();
                    if (img == null) continue;

                    using (img)
                    {
                        int width = img.Width;
                        int height = img.Height;

                        using (var msImg = new MemoryStream())
                        using (var bw = img.ToBlackAndWhite())
                        {
                            bw.Save(msImg, ImageFormat.Png);
                            msImg.Position = 0;
                            stream.SetData(msImg.ToArray(), false, PdfStream.NO_COMPRESSION);
                            stream.Put(PdfName.TYPE, PdfName.XOBJECT);
                            stream.Put(PdfName.SUBTYPE, PdfName.IMAGE);
                            stream.Put(PdfName.FILTER, PdfName.FLATEDECODE);
                            stream.Put(PdfName.WIDTH, new PdfNumber(width));
                            stream.Put(PdfName.HEIGHT, new PdfNumber(height));
                            stream.Put(PdfName.BITSPERCOMPONENT, new PdfNumber(8));
                            stream.Put(PdfName.COLORSPACE, PdfName.DEVICERGB);
                            stream.Put(PdfName.LENGTH, new PdfNumber(msImg.Length));
                        }
                    }
                }
                catch (Exception ex)
                {
                    Trace.TraceError(ex.ToString());
                }
                finally
                {
                    // may or may not help      
                    reader.RemoveUnusedObjects();
                }
            }
            return true;
        }
        catch (Exception ex)
        {
            Trace.TraceError(ex.ToString());
            return false;
        }
    }

    public static Image ToBlackAndWhite(this Image image)
    {
        image = new Bitmap(image);
        using (Graphics gr = Graphics.FromImage(image))
        {
            var grayMatrix = new[]
            {
                new[] {0.299f, 0.299f, 0.299f, 0, 0},
                new[] {0.587f, 0.587f, 0.587f, 0, 0},
                new[] {0.114f, 0.114f, 0.114f, 0, 0},
                new [] {0f, 0, 0, 1, 0},
                new [] {0f, 0, 0, 0, 1}
            };

            var ia = new ImageAttributes();
            ia.SetColorMatrix(new ColorMatrix(grayMatrix));
            ia.SetThreshold((float)0.8); // Change this threshold as needed
            var rc = new Rectangle(0, 0, image.Width, image.Height);
            gr.DrawImage(image, rc, 0, 0, image.Width, image.Height, GraphicsUnit.Pixel, ia);
        }
        return image;
    }

我已经试过品种的色彩空间和BITSPERCOMPONENTs的,但总是得到数据不足的图像,内存不足或此页面上存在错误时试图打开生成的PDF ...等等我一定是做错了。我是pretty的肯定FLATEDE code是正确的事情来使用。

I've tried varieties of COLORSPACEs and BITSPERCOMPONENTs, but always get "Insufficient data for an image", "Out of memory", or "An error exists on this page" upon trying to open the resulting PDF...so I must be doing it wrong. I'm pretty sure FLATEDECODE is the right thing to use.

任何帮助将是非常美联社preciated。

Any assistance would be much appreciated.

推荐答案

的问题:

您有一个彩色的JPG PDF文件。例如: image.pdf

You have a PDF with a colored JPG. For instance: image.pdf

如果你看这个PDF里面,你会看到的图像流的过滤器是 / DCTDe code 和色彩空间是 / DeviceRGB

If you look inside this PDF, you'll see that the filter of the image stream is /DCTDecode and the color space is /DeviceRGB.

现在你要替换的图像中的PDF格式,以便结果如下:的 image_replaced.pdf

Now you want to replace the image in the PDF, so that the result looks like this: image_replaced.pdf

在此PDF,过滤器是 / FlateDe code 和色彩空间更改为 / DeviceGray

In this PDF, the filter is /FlateDecode and the color space is change to /DeviceGray.

在转换过程中,要用户PNG格式。

In the conversion process, you want to user a PNG format.

的例子:

我使你一个例子,使得这一转换: ReplaceImage

I have made you an example that makes this conversion: ReplaceImage

我将解释一步这个例子步:

I will explain this example step by step:

第1步:找到图片

在我的例子,我知道,只有一个形象,所以我检索 PRStream 的图像字典,并在一个快速和肮脏的方式图像的字节数。

In my example, I know that there's only one image, so I'm retrieving the PRStream with the image dictionary and the image bytes in a quick and dirty way.

PdfReader reader = new PdfReader(src);
PdfDictionary page = reader.getPageN(1);
PdfDictionary resources = page.getAsDict(PdfName.RESOURCES);
PdfDictionary xobjects = resources.getAsDict(PdfName.XOBJECT);
PdfName imgRef = xobjects.getKeys().iterator().next();
PRStream stream = (PRStream) xobjects.getAsStream(imgRef);

我去了 /了XObject 字典与 /资源第1页的页字典列出。 我拿第一了XObject我遇到的,假定它是一个IMAGEM,我得到的图像作为 PRStream 对象。

I go to the /XObject dictionary with the /Resources listed in the page dictionary of page 1. I take the first XObject I encounter, assuming that it is an imagem and I get that image as a PRStream object.

您code是比我好,但是这部分的code是不相关的你的问题和它的作品在我的例子的情况下,让我们忽略了一个事实,这是行不通的其他的PDF。什么,你真正关心的是步骤2和3。

Your code is better than mine, but this part of the code isn't relevant to your question and it works in the context of my example, so let's ignore the fact that this won't work for other PDFs. What you really care about are steps 2 and 3.

第2步:将彩色JPG变成了黑色和白色PNG

让我们写一个方法,需要一个 PdfImageObject 键,将其转换成图片对象变为灰色颜色和保存为PNG:

Let's write a method that takes a PdfImageObject and that converts it into an Image object that is changed into gray colors and stored as a PNG:

public static Image makeBlackAndWhitePng(PdfImageObject image) throws IOException, DocumentException {
    BufferedImage bi = image.getBufferedImage();
    BufferedImage newBi = new BufferedImage(bi.getWidth(), bi.getHeight(), BufferedImage.TYPE_USHORT_GRAY);
    newBi.getGraphics().drawImage(bi, 0, 0, null);
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    ImageIO.write(newBi, "png", baos);
    return Image.getInstance(baos.toByteArray());
}

我们使用标准的的BufferedImage 操作将原始图像转换为黑白图像:我们绘制的原图双向以全新的形象 newBi 类型 TYPE_USHORT_GRAY

We convert the original image into a black and white image using standard BufferedImage manipulations: we draw the original image bi to a new image newBi of type TYPE_USHORT_GRAY.

一旦做到这一点,要在PNG格式图像的字节数。这也可以使用标准的的ImageIO functionaltiy:我们只写了的BufferedImage 字节数组告诉的ImageIO 我们希望PNG

Once this is done, you want the image bytes in the PNG format. This is also done using standard ImageIO functionaltiy: we just write the BufferedImage to a byte array telling ImageIO that we want "png".

我们可以使用生成的字节创建一个图片对象。

We can use the resulting bytes to create an Image object.

Image img = makeBlackAndWhitePng(new PdfImageObject(stream));

现在我们有一个iText的图片对象,不过请注意,存储在该图片对象的图像字节不再在PNG格式。如已经在评论中提到,PNG不以PDF格式的支持。 iText的将改变图像字节到所支持的PDF(详细内容格式见 PDF的ABC )。

Now we have an iText Image object, but please note that the image bytes as stored in this Image object are no longer in the PNG format. As already mentioned in the comments, PNG is not supported in PDF. iText will change the image bytes into a format that is supported in PDF (for more details see section 4.2.6.2 of The ABC of PDF).

第3步:使用新的图像流取代了原有的图像流

我们现在有一个图片的对象,但我们真正需要的是一个新的替换原来的图像流,我们还需要适应图像字典为 / DCTDe code 将变成 / FlateDe code / DeviceRGB 将变成 / DeviceGray ,而在 /长度值也将不同。

We now have an Image object, but what we really need is to replace the original image stream with a new one and we also need to adapt the image dictionary as /DCTDecode will change into /FlateDecode, /DeviceRGB will change into /DeviceGray, and the value of the /Length will also be different.

您正在创建手动的图像流和它的字典。这是勇敢的。我离开这个工作的iText的 PdfImage 目标:

You are creating the image stream and its dictionary manually. That's brave. I leave this job to iText's PdfImage object:

PdfImage image = new PdfImage(makeBlackAndWhitePng(new PdfImageObject(stream)), "", null);

PdfImage 延伸 PdfStream ,我现在可以替换原来的物流与此新流:

PdfImage extends PdfStream, and I can now replace the original stream with this new stream:

public static void replaceStream(PRStream orig, PdfStream stream) throws IOException {
    orig.clear();
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    stream.writeContent(baos);
    orig.setData(baos.toByteArray(), false);
    for (PdfName name : stream.getKeys()) {
        orig.put(name, stream.get(name));
    }
}

在你做的事情在这里的顺序很重要。你不想在使用setData()的方法来篡改长度和过滤器。

The order in which you do things here is important. You don't want the setData() method to tamper with the length and the filter.

第四步:在更换流后持续的文档

我想这不难推测这部分了:

I guess it's not hard to figure this part out:

replaceStream(stream, image);
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest));
stamper.close();
reader.close();

问题:

我不是一个C#开发人员。我知道PDF内而外的,我知道的Java。

I am not a C# developer. I know PDF inside-out and I know Java.

如果你的问题是由于在第2步,那么你就必须张贴另外一个问题,询问如何将一个彩色的JPEG图像变成黑白的PNG图片。 如果你的问题是由于在第3步(例如,因为你正在使用 / DeviceRGB 而不是 / DeviceGray ),那么这个答案将解决您的问题。 If your problem is caused in step 2, then you'll have to post another question asking how to convert a colored JPEG image into a black and white PNG image. If your problem is caused in step 3 (for instance because you are using /DeviceRGB instead of /DeviceGray), then this answer will solve your problem.