Pdf Hacks Free Open Book

Pdf Hacks

Previous Section  < Day Day Up >  Next Section

Hack 8 Convert PDF Documents to Word

figs/moderate.gif figs/hack8.gif

Automatically scrape clipboard data into a new Word document.

In general, PDFs aren't as smart as they appear. Unless they are tagged [Hack #34], they have no concept of paragraph, table, or column. This becomes a problem only when you must create a new document using material from an old document. Ideally, you would use the old document's source file, or maybe even its HTML edition. This isn't always possible, however. Sometimes you have only a PDF to work with.

1.9.1 Save As . . . DOC, RTF, HTML

Adobe Acrobat 6 enables you to convert your PDF to many different formats with the Save As . . . dialog. These filters work best when the PDF is tagged. Try one to see if it suits your requirements. Adobe Reader enables you to convert your PDF to text by selecting File Save As Text . . . .

If your PDF is not tagged, Acrobat uses an inference engine to assemble the letters into words and the words into paragraphs. It tries to detect and create tables. It works best on documents with very simple formatting. Tables and formatted pages generally don't survive.

1.9.2 The Human Touch

Fully automatic conversion of PDF to a structured format such as Word's DOC is not generally possible because the problem is too big. One workaround is to break the problem down to the point where the automation has a chance. The TAPS tool [Hack #7] works well because you meet the automation halfway. You tell it where the table is and it creates a table from the given data. This approach can be scaled to fit the larger problem of converting entire documents.

1.9.3 Scrape the Clipboard into a New Document with AutoPasteLoop

Copy/Paste works fine for a few items, but it grows cumbersome when processing several pages of data. AutoPasteLoop is a Word macro that watches the clipboard for new data and then immediately pastes it into your new document. Instead of copy/paste, copy/paste, copy/paste, you can just copy, copy, copy. Word automatically pastes, pastes, pastes.

Scott Tupaj has ported AutoPasteLoop to OpenOffice. Download the code from http://www.pdfhacks.com/autopaste/.


Create a new Word macro named AutoPasteLoop in Normal.dot and program it like this:

'AutoPasteLoop, version 1.0

'Visit: http://www.pdfhacks.com/autopaste/

'

'Start AutoPasteLoop from MS Word and switch to Adobe Reader or Acrobat.

'Copy the material you want, and AutoPasteLoop will automatically

'paste it into the target Word document.  When you are done, switch back

'to MS Word and AutoPasteLoop will stop.



Option Explicit



' declare Win32 API functions that we need

Declare Function Sleep Lib "kernel32" (ByVal insdf As Long) As Long

Declare Function GetForegroundWindow Lib "user32" ( ) As Long

Declare Function GetOpenClipboardWindow Lib "user32" ( ) As Long

Declare Function GetClipboardOwner Lib "user32" ( ) As Long



Sub AutoPasteLoop( )

    'the HWND of the application we're pasting into (MS Word)

    Dim AppHwnd As Long

    'assume that we are executed from the target app.

    AppHwnd = GetForegroundWindow( )

    

    'keep track of whether the user switches out

    'of the target application (MS Word).

    Dim SwitchedApp As Boolean

    SwitchedApp = False

    

    'reset this to stop looping

    Dim KeepLooping As Boolean

    KeepLooping = True

    

    'the HWND of our target document; GetClipboardOwner returns the

    'HWND of the app. that most recently owned the clipboard;

    'changing the clipboard's contents (Cut) makes us the "owner"

    '

    'note that "owning" the clipboard doesn't mean that it's locked

    '

    Dim DocHwnd As Long

    Selection.TypeText Text:="abc"

    Selection.MoveLeft Unit:=wdCharacter, Count:=3, Extend:=wdExtend

    Selection.Cut

    DocHwnd = GetClipboardOwner( )

    

    Do While KeepLooping

        Sleep 200 'milliseconds; 100 msec == 1/10 sec

        

        'if the user switches away from the target

        'application and then switches back, stop looping

        '

        Dim ActiveHwnd As Long

        ActiveHwnd = GetForegroundWindow( )

        If ActiveHwnd = AppHwnd Then

            If SwitchedApp Then KeepLooping = False

        Else

            SwitchedApp = True

        End If

    

        'if the clipboard owner has changed, then somebody else

        'has put something on it; if the clipboard resource isn't

        'locked (GetOpenClipboardWindow), then paste its contents

        'into our document; use Copy to change the clipboard owner

        'back to DocHwnd

        '

        If GetClipboardOwner( ) <> DocHwnd And _

        GetOpenClipboardWindow( ) = 0 Then

            Selection.Paste

            Selection.MoveLeft Unit:=wdCharacter, Count:=1, Extend:=wdExtend

            Selection.Copy

            Selection.Collapse wdCollapseEnd

        End If

    Loop

End Sub

1.9.4 Running AutoPasteLoop

Open a new Word document. Start AutoPasteLoop by opening the Macros dialog box (Tools Macros Macros . . . ), selecting the macro name AutoPasteLoop, and clicking Run. When your loop is running, you are not able to interact with Word. Stop the loop by switching to another application and then switching back to Word.

Start the loop. Switch to Acrobat (or Reader) and use its tools to individually select and copy its columns, tables, paragraphs, and images. Switch back to Word and you should find all of your selections pasted into the new document. Start AutoPasteLoop again if you want to copy more material.

1.9.5 Hacking AutoPasteLoop

Add content filters or your own inference logic to the AutoPasteLoop macro. Use your knowledge of the input documents to tailor the loop, so it creates documents that require less postprocessing.

AutoPasteLoop isn't just a PDF hack. It works with any program that can copy content to the clipboard.

    Previous Section  < Day Day Up >  Next Section
    Index: [SYMBOL][A][B][C][D][E][F][G][H][I][J][L][M][N][O][P][Q][R][S][T][U][V][W][X][Z]


         Main Menu
    PDF Hacks
    Table of Contents
    Copyright
    Credits
    Preface
    Chapter 1. Consuming PDF
    Introduction: Hacks #1-14
    Hack 1 Read PDFs with the Adobe Reader
    Hack 2 Read PDFs with Mac OS X's Preview
    Hack 3 Read PDFs with Ghostscript's GSview
    Hack 4 Speed Up Acrobat Startup
    Hack 5 Manage Acrobat Plug-Ins with Profiles on Windows
    Hack 6 Open PDF Files Your Way on Windows
    Hack 7 Copy Data from PDF Pages
    Hack 8 Convert PDF Documents to Word
    Hack 9 Browse One PDF in Multiple Windows
    Hack 10 Pace Your Reading or Present a Slideshow in Acrobat or Reader
    Hack 11 Pace Your Reading or Present a Slideshow in Mac OS X Preview
    Hack 12 Unpack PDF Attachments (Even Without Acrobat)
    Hack 13 Jump to the Next or Previous Heading
    Hack 14 Navigate and Manipulate PDF Using Page Thumbnails
    Chapter 2. Managing a Collection
    Chapter 3. Authoring and Self-Publishing: Hacking Outside the PDF
    Chapter 4. Creating PDF and Other Editions
    Chapter 5. Manipulating PDF Files
    Chapter 6. Dynamic PDF
    Chapter 7. Scripting and Programming Acrobat
    Colophon
    Index


    More Books
    PHP Hacks
    Processing Xml With Java - A Guide To Sax, Dom, Jdom, Jaxp, And Trax
    The Koran (Holy Qur'an)
    Macromedia Flash 8 Bible
    Search Engine Optimization for Dummies
    YouTube Traffic
    PHP 5 for Dummies
    Harry Potter and The Chamber of Secrets
    Harry Potter and the Sorcerer's Stone
    The Pilgrim's Progress
    Wireless Hacks
    Flash Hacks. 100 Industrial-Strength Tips & Tools
    PayPal Hacks. 100 Industrial-Strength Tips and Tools
    Amazon Hacks
    Pdf Hacks
    The Da Vinci Code
    Google Hacks
    The Holy Bible
    Windows XP For Dummies
    Harry Potter and the Half-Blood Prince
    Seo Book
    Upgrading and Repairing Networks
    Macromedia Dreamweaver 8 UNLEASHED
    Windows XP Annoyances
    Windows XP Hacks
    Microsoft Windows XP Power Toolkit
    Teach Yourself MS Office In 24Hours
    iPod & iTunes Missing Manual
    PC Hacks 100 Industrial-Strength Tips and Tools
    PC Overclocking, Optimization, and Tuning - 2th Edition
    PC Hardware In A Nutshell 3rd Edition
    PC Hardware in a Nutshell, 2nd Edition
    Upgrading and Repairing PCs
    Google for Dummies
    MySQL Cookbook
    Teach Yourself Macromedia Flash 8 In 24 Hours
    PHP CookBook
    Sams Teach Yourself JavaScript in 24 Hours
    PHP5 Manual
    Free Games Paper Airplanes
    500 Juegos Gratis 500 Giochi Gratis 500 Jeux Gratuits 500 Jogos Gratis 500 Kostenlose Spiele