Home
 
The page you are viewing is part of our 160,000 page PDF discussion forum archive spanning 1999-2011.
Go to the new Planet PDF forum to join the PDF community.
 

Search
 Advanced   
Sign up for a WebBoard account Archive Forum Home


PLATINUM
SPONSOR

GOLD
SPONSORS



Topic  
Go to previous topicPrev TopicGo to next topicNext Topic
Author Message
alchemist
alchemist

Total Messages 5

Subject:Decoding Images

In general I've been quite successful in extracting images by navigating the iText (http://itextpdf.com/) data-structure, identifying the encoding and applying the appropriate decoder to the PdfStream.

I have a slightly more complex situation where it appears the images are in a PDF form encoded with FLATEDECODE

So I decode the FLATEDECODE data and end up with the following:

/GS5 gs
q
0 0 1 1 re
W n
/GS4 gs
0 J 0 j 4 M []0 d 0.0284 w
0 0 0 0.7 K
0.9539 0.5332 m
0.9891 0.5332 l
0.9891 0.5896 l
0.9697 0.6397 l
0.4126 0.6397 l
0.4126 0.337 l
0.9697 0.337 l
0.9912 0.3718 l
0.9912 0.4193 l
0.6447 0.4193 l
0.6792 0.5332 l
0.7494 0.5332 l
0.7494 0.4957 l
0.7814 0.4957 l
0.7814 0.545 l
0.8219 0.545 l
0.8219 0.4994 l
0.8557 0.4994 l
0.8557 0.5488 l
0.8982 0.5488 l
0.8982 0.4945 l
0.9315 0.4945 l
0.931 0.5327 l
s
0.7 0 0 0.45 k
0.0118 0.1781 m
0.0757 0.0868 0.1632 0.0305 0.2595 0.0305 c
0.356 0.0305 0.4433 0.0868 0.5073 0.1779 c
0.5073 0.8219 l
0.4433 0.9132 0.3559 0.9695 0.2595 0.9695 c
0.1631 0.9695 0.0757 0.9132 0.0118 0.822 c
0.0118 0.1842 l
f
0.0379 w
0 0 0 1 K
0.0118 0.1781 m
0.0757 0.0868 0.1632 0.0305 0.2595 0.0305 c
0.356 0.0305 0.4433 0.0868 0.5073 0.1779 c
0.5073 0.8219 l
0.4433 0.9132 0.3559 0.9695 0.2595 0.9695 c
0.1631 0.9695 0.0757 0.9132 0.0118 0.822 c
0.0118 0.1842 l
s
0 0 0 0.7 K
0.9776 0.5242 m
0.9776 0.4343 l
S
0 0 0 1 k
0.3677 0.5177 m
0.3677 0.3633 0.3193 0.2382 0.2597 0.2382 c
0.2001 0.2382 0.1517 0.3633 0.1517 0.5177 c
0.1517 0.672 0.2001 0.7972 0.2597 0.7972 c
0.3193 0.7972 0.3677 0.672 0.3677 0.5177 c
f
0.0265 w
0 0 0 0 K
0.3677 0.5177 m
0.3677 0.3633 0.3193 0.2382 0.2597 0.2382 c
0.2001 0.2382 0.1517 0.3633 0.1517 0.5177 c
0.1517 0.672 0.2001 0.7972 0.2597 0.7972 c
0.3193 0.7972 0.3677 0.672 0.3677 0.5177 c
s
0 0 0 0 k
0.2107 0.4173 m
0.2107 0.5479 l
0.3071 0.5479 l
0.3071 0.4173 l
0.3071 0.4152 l
0.3071 0.355 0.2855 0.3061 0.2589 0.3061 c
0.2323 0.3061 0.2107 0.355 0.2107 0.4152 c
f
0.231 0.5074 m
0.231 0.6125 l
0.231 0.6569 0.2449 0.693 0.2621 0.693 c
0.2793 0.693 0.2932 0.6569 0.2932 0.6125 c
0.3054 0.6125 l
0.3054 0.6743 0.286 0.7244 0.2621 0.7244 c
0.2382 0.7244 0.2188 0.6743 0.2188 0.6125 c
0.2188 0.5074 l
f
0 0 0 1 k
0.2733 0.4696 m
0.2733 0.4487 0.2667 0.4319 0.2587 0.4319 c
0.2506 0.4319 0.2441 0.4487 0.2441 0.4696 c
0.2441 0.4905 0.2506 0.5074 0.2587 0.5074 c
0.2667 0.5074 0.2733 0.4905 0.2733 0.4696 c
f
0.2695 0.3512 m
0.2478 0.3512 l
0.2516 0.4706 l
0.2664 0.4706 l
f
Q


1. Exactly what format is the above, is it raw ghost-script because I can't seem to read it with anything (gsview,acrobat etc)?

2. If it is an EPS (maybe it's something else?) what do I need to do to turn this into and viewable image?


Posted: 15 Nov 2010 11:04 PM
Left Button SpacerView user's profile profileRight Button Spacer
 
 
aandi
aandi

Total Messages 17064

Subject:Decoding Images

Please see the PDF Reference. You cannot do this stuff by guessing!!


Posted: 15 Nov 2010 11:05 PM
Left Button SpacerView user's profile profileRight Button Spacer
 
 
alchemist
alchemist

Total Messages 5

Subject:Decoding Images

Can you please be a little bit more verbose I do not understand?


Posted: 15 Nov 2010 11:06 PM
Left Button SpacerView user's profile profileRight Button Spacer
 
 
aandi
aandi

Total Messages 17064

Subject:Decoding Images

This information is all clearly explained in the PDF Reference. Your question suggests you have not read this document. But it is absolutely necessary for your task to read the first chapters (as far as the chapter "Graphics").


Posted: 15 Nov 2010 11:12 PM
Left Button SpacerView user's profile profileRight Button Spacer
 
 
aandi
aandi

Total Messages 17064

Subject:Decoding Images

If I am wrong, and you have read the document, perhaps we can find you find out where you have gone wrong, because this is not an image. What makes you believe it is an image?


Posted: 15 Nov 2010 11:17 PM
Left Button SpacerView user's profile profileRight Button Spacer
 
 
alchemist
alchemist

Total Messages 5

Subject:Decoding Images

Okay thanks...you are absolutely correct I haven't read this document. Where can I download or get a copy of "PDF Reference"? I assume it's produced by Adobe?


Posted: 15 Nov 2010 11:20 PM
Left Button SpacerView user's profile profileRight Button Spacer
 
 
alchemist
alchemist

Total Messages 5

Subject:Decoding Images

...is this the one?

http://www.adobe.com/devnet/pdf/pdf_reference.html


Posted: 15 Nov 2010 11:22 PM
Left Button SpacerView user's profile profileRight Button Spacer
 
 
aandi
aandi

Total Messages 17064

Subject:Decoding Images

Yes, that's it. It's now an ISO standard, but you can get it free from Adobe.

(This is very helpful of Adobe. Usually these open standards are expensive: 380 swiss francs for ISO 32000-1:2008).


Posted: 15 Nov 2010 11:45 PM
Left Button SpacerView user's profile profileRight Button Spacer
 
 
alchemist
alchemist

Total Messages 5

Subject:Decoding Images

It's also likely that I'm going completely down the wrong path to so I'll explain my problem in more detail.

I am try ing to extract FPO low res images that appear in a PDF document (those appearing with OPI comments)

For this particular document (I chose this one a random from the internet) I have problems:

http://media.wiley.com/product_data/excerpt/18/EHEP0000/EHEP000018-1.pdf

Now I know the image exists because I can see it on the page so the image is there somewhere (the low res one). Note the F PdfName refers to the high res image (which I don't care about for now)

This is the object which is identified (for page 3) as the OPI reference for the image. Normally it would have a reference to an XObject (which is the low res image) but I can't find it so I assumed it was in a form connected to this obj?

^M25 0 obj<>>>/Subtype/Form/Length 70/Filter/FlateDecode/Name/Fm20/Matrix[1 0 0 1 0 0]/Resources<>/ProcSet[/PDF]/ExtGState<>>>/Type/XObject/BBox[0 0 1332.0 783.0]/FormType 1>>stream


Posted: 15 Nov 2010 11:54 PM
Originally Posted: 15 Nov 2010 11:50 PM
Left Button SpacerView user's profile profileRight Button Spacer
 
 
aandi
aandi

Total Messages 17064

Subject:Decoding Images

It's easy to miss that an OPI entry can appear in either an image dictionary (what you expect) or a form XObject (what you find).

A form XObject is a general collection of page marking operators, the same ones found on the page itself, with its own resources. These resources can contain images, which will be rendered as well as the vector art. And it can contain nested form XObjects too, this can be deeply nested. These can in turn contain their own OPI entries - and you may well find this case.

In general, then, an OPI dictionary is associated with some drawing on the page, which might be an image, or might use the full range of vector+image+text available in PDF. No image data is stored for this case (it isn't needed).


Posted: 16 Nov 2010 01:34 AM
Left Button SpacerView user's profile profileRight Button Spacer
 
 
   
Go to previous topicPrev TopicGo to next topicNext Topic

go




Top | News | Developer | Find PDF Tools | Sponsorship | Tips | Home



Debenu's Planet PDF - a comprehensive site exclusively focused on Adobe Acrobat/PDF users and uses. Debenu's Planet PDF - Resources for Adobe Acrobat and PDF users

Please post PDF-related questions to our Planet PDF Forum forums.planetpdf.com. To send comments about this site please visit our contact page.
Planet PDF, Planet PDF Forum, PDF SDK - Debenu Quick PDF Library and Debenu are all copyright
© 2011 Debenu Pty Ltd.. All Rights Reserved. Privacy policy.