Nov 8, 2009
Pages: 1, 2

Document Digitization - How to approach this?

free web hosting

Read Latest Entries..: (Post #11) by yordan on Aug 14 2009, 04:56 PM.
QUOTE (Atomic0 @ Jun 27 2009, 08:04 AM) The real key to scanning a large quantity of loose leaf paper is a printer that has a document feederYou probably wanted to say "a scanner that has a document feeder". Some fast scanners have a document feeder without being printers. And some printers have a very slow scanner....
read more.
Read the FIRST post of this Topic. - Express your Opinion! Contribute Knowledge :-).

Open Discussion & Free Web Hosting > General Discussion > Computer Talk

Document Digitization - How to approach this?

FirefoxRocks
I have been wondering if there is a fast way of doing this. I have 3 piles of documents that I would like to store digitally on the computer as PDFs or whatever. This translates into approximately 3000 documents, maybe a bit less.

Relevant hardware and software I have are:
  • Lexmark X5070 all-in-one
  • Windows XP SP3/Windows Vista SP1/Windows 7 Beta/Ubuntu 9
  • PDF printer software driver
  • Anything that came with the all-in-one
  • Windows Live Photo Gallery
  • - I can also download free software from Download.com if necessary

The whole process of scanning the document, waiting for the computer processing the image and saving it takes a total of 50 seconds or so per document. By going non-stop, I estimate this will take at least 42 hours to do this.

I was wondering if there was a way for the all-in-one to take, say 60 papers, scan them one by one and save them with image1.png, image2.png or whatever. I need to do this automatically so I can leave it unattended for an hour without sitting there putting papers in and watching the progress bar over and over.

The documents contain a lot of textual information (right now I'm not focused on newspaper clippings) and typing them up would take even longer. All I need is PNGs or PDFs automatically created, then I'll sort through them in My Documents. The first priority is getting rid of the physical papers and recycling them!!

Any idea on how to do this?

 

 

 


Comment/Reply (w/o sign-up)

yordan
The real problem is : how are your documents made ? For books, can you cut the bindings in order to have free leaves ?
I would say that, for your problem, the most comfortable way is to ask a professional guy to do it.
The last high-end professional scanner I used was supposed to scan 5000 documents per minute (less than one second per document, recto-verso). And of course everything was in a giant PDV or Crosoft Word file...
Just have a look at the professional scanners specs, it's really impressive.
Of course, only big companies (like national Social Security) could pay $40000 (yes, forty thousands dollars) for a scanner, that's why I told you "don't try with your home small toy, have it done on a real scanner".

Comment/Reply (w/o sign-up)

Spencer
5000, five thousand docs per minute. That is like flipping a book and its done with scanning. Amazing, the hardware would be really high end to process something like that. They also need good hard disks to write data that fast and efficiently. Hope somebody may soon suggest a better solution on digitizing those texts. I don't know, I am referring to something totally other thing here, you can have a look of gutenberg scanned copies. May be somebody in those forums may help.

Comment/Reply (w/o sign-up)

yordan
Sorry, it's per hour, start falling asleep. And, yes, you need fast disks to swallow that.

Comment/Reply (w/o sign-up)

FirefoxRocks
It isn't books, 90% of it is in regular 8.5"*11" letter sized paper. Some of them are on 11"*17" but those are not that common.

Anyways, I have found a solution to this problem that will create one huge PDF file approximately 100 sheets at a time.

Comment/Reply (w/o sign-up)

mastercomputers
Hey FirefoxRocks,

There are solutions, although huge files created from these programs is very likely, but that's to be expected.

There's DocsVault or the Open Source KnowledgeTree.

This eliminates scanning one page, and saving it one by one. You can scan all your documents, stacking them inside 1 file, as a per page by page basis in different file formats, scan them all individually, then stack them ontop of one another, and possibly other features that I haven't delved into yet.

There's no requirement for a PDF printer, these programs can create that format with the scanned files. Depending on the quality you need the documents to be at, the lowest resolution, black and white, will make the files smaller but readibility could go up.


Cheers,


MC

Comment/Reply (w/o sign-up)

Tian
If the scanner is big enough, you can put as many documents as you could at one time, say 10. This way at least it could reduce your labors down to one 10th since your documents 90% of it is in regular 8.5"*11" letter sized paper. And I am so curious about the solution to this problem you found, could you sharing it with us? Thank you!

Comment/Reply (w/o sign-up)

FirefoxRocks
QUOTE (Tian @ Jun 14 2009, 10:25 PM) *
If the scanner is big enough, you can put as many documents as you could at one time, say 10. This way at least it could reduce your labors down to one 10th since your documents 90% of it is in regular 8.5"*11" letter sized paper. And I am so curious about the solution to this problem you found, could you sharing it with us? Thank you!

The solution: I am requesting permission from administration to use the school photocopier to do this, which can take approximately 100 documents each time and create a PDF file with 100 pages. The PDF will be emailed to me when it is complete. This way I will only have to reload the documents 20 times or so (I found out I had less documents than estimated).

Comment/Reply (w/o sign-up)

yordan
Congrats. You found the real solution.
A big photocopy engine very often has an embedded scanner, and some of them allow saving the scanned file, yes, it's a smart solution.

Comment/Reply (w/o sign-up)

FirefoxRocks
QUOTE (yordan @ Jun 17 2009, 03:01 AM) *
Congrats. You found the real solution.
A big photocopy engine very often has an embedded scanner, and some of them allow saving the scanned file, yes, it's a smart solution.

It turned out to take much longer than I expected because of 2 things:
  1. Crinkled/ripped paper jams easily in the feeder
  2. I didn't know that so many of my documents had staples in them.

Nonetheless, it is quite fast and now I have to download 50 PDF files and split them apart.

Comment/Reply (w/o sign-up)

Latest Entries

yordan
QUOTE (Atomic0 @ Jun 27 2009, 08:04 AM) *
The real key to scanning a large quantity of loose leaf paper is a printer that has a document feeder

You probably wanted to say "a scanner that has a document feeder". Some fast scanners have a document feeder without being printers. And some printers have a very slow scanner.

Comment/Reply (w/o sign-up)


Got an Opinion! Express your Views! (no registration):-
Add your Reply/ Opinion/ Views/ Comments/ Suggestion/ Questions/ Queries etc.
Posts with decent grammar & English will be accepted and please refrain from profanities.
For asking a Question, We recommend you to sign-up (for free) so that you can track the topic easily.

Nature of your Post*: Opinion/ Reply/ Comments
Question/Query
Feedback to us.
       
Name   Email
Title/Question*

This textarea will convert to Rich-Text automatically (IE, Firefox, Chrome)

Pages: 1, 2

See Also,

*SIMILAR VIDEOS*
Searching Video's for Document, Digitization
advertisement



Document Digitization - How to approach this?

Affordable Web Hosting, Low cost Web Hosting - ComputingHost.com