sane-scan-pdf/README.md
Raman Gupta ded351edf4
Merge pull request #31 from p/readme-tips
Add helpful usage tips to the readme
2021-07-30 11:47:58 -04:00

131 lines
4.4 KiB
Markdown

# SANE Command-Line Scan to PDF
Sane command-line scanning bash shell script on Linux with OCR and deskew support. The script automates
common scan-to-pdf operations for scanners with an automatic document feeder, such as the awesome Fujitsu
ScanSnap S1500, with output to PDF files.
Tested and run regularly on Fedora, but should work on other distributions with the requirements below.
## Features
* Join scanned pages into a single output file, or specify a name for each page
* Deskew (if supported by scanner driver, or software-based via unpaper)
* Crop (if supported by scanner driver)
* Creates searchable PDFs (with tesseract)
* Duplex (if scanner supports it)
* Specify resolution
* Truncate n pages explicitly from end of scan e.g. duplex scanning with last page truncated
* Skip white-only pages automatically (with ImageMagick)
* Specify page width and height for odd size pages, or common sizes (Letter, Legal, A4)
* Performance: scanner run in parallel with page post-processing
* Limit parallel processing for very fast scanners or constrained environments (if sem installed)
## Requirements
The following dependencies are requirements of the script. See also [Dependencies
Installation](https://github.com/rocketraman/sane-scan-pdf/wiki/Dependencies-Installation).
* bash
* pnmtops (netpbm-progs)
* ps2pdf (ghostscript)
* pdfunite (poppler-utils)
* units (units)
* ImageMagick (if --skip-empty-pages or --ocr is used)
### Optional
* unpaper (for software deskew)
* flock (usually provided by util-linux) (for properly ordered verbose logs)
* tesseract (to make searchable PDFs)
* sem (via gnu-parallels, to constrain resource usage during page processing)
* bc (for whitepage detection percentage calculations)
## Getting Started
```
# scan --help
scan [OPTIONS]... [OUTPUT]
OPTIONS
-v, --verbose
Verbose output (this will slow down the scan due to the need to prevent interleaved output)
-d, --duplex
Duplex scanning
-m, --mode
Mode e.g. Lineart (default), Halftone, Gray, Color, etc. Use --mode-hw-default to not set any mode
--mode-hw-default
Do not set the mode explicitly, use the hardware default
-r, --resolution
Resolution e.g 300 (default)
-a, --append
Append output to existing scan
-e, --max <pages>
Max number of pages e.g. 2 (default is all pages)
-t, --truncate <pages>
Truncate number of pages from end e.g. 1 (default is none)
-s, --size
Page Size as type e.g. Letter (default), Legal, A4, no effect if --crop is specified
-ph, --page-height
Custom Page Height in mm
-pw, --page-width
Custom Page Width in mm
-x, --device
Override scanner device name, defaulting to `fujitsu`
--crop
Crop to contents (driver must support this)
--deskew
Run driver deskew (driver must support this)
--unpaper
Run post-processing deskew and black edge detection (requires unpaper)
--ocr
Run OCR to make the PDF searchable (requires tesseract and ImageMagick)
--skip-empty-pages
remove empty pages from resulting PDF document (e.g. one sided doc in duplex mode)
OUTPUT
-o, --output <outputfile>
Output to named file default=scan.pdf
-l, --outputlist <outputfile-1...outputfile-n> Output to named files for each scanned page, can be used with append
```
### Tips
The default scanner device is set to `fujitsu`. If you have another scanner,
you will need to use the `-x`/`--device` argument to specify your scanner.
See below for how to get the list of available devices.
The scanners and scanner drivers vary in features they support. This script
provides several options to the underlying scanner driver by default, and
these options may not be supported by your scanner/scanner driver. If
you are receiving an error about `--page-width`/`--page-height` being
unrecognized options, try the `--no-default-size` option. If you receive an
error about the `--mode` value being invalid, try `--mode-hw-default`
and see below for how to retrieve the list of modes that your system understands.
### Helpful Commands
List available scanner devices (for `-x`/`--device` argument):
```
scanadf -L
```
List available device-specific options, including acceptable values for
`-m`/`--mode` and `-r`/`--resolution`:
```
scanadf [-d <device>] --help
```
## Author
* [Raman Gupta](https://github.com/rocketraman/)
## Contributors
* [Stefan Armbruster](https://github.com/sarmbruster/)
## Other Useful Software
* [OCRmyPDF](https://github.com/jbarlow83/OCRmyPDF) - forgot to use the `--ocr` option at scanning time? use this