- Notifications
You must be signed in to change notification settings - Fork141
prabhakar267/image2text
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Image2Text is a python wrapper to grab text from images and save as text files usingGoogle Tesseract Engine. Tesseract is an optical character recognition engine for various operating systems. It is free software, released under the Apache License, Version 2.0, and development has been sponsored by Google since 2006. In 2006 Tesseract was considered one of the most accurate open-source OCR engines then available.
Quick Links:
python main.py -i<input_path> -o<output_path>
usage: main.py [-h] -i INPUT [-o OUTPUT] [-d]required arguments: -i INPUT, --input INPUT Single image file path or images directory pathoptional arguments: -o OUTPUT, --output OUTPUT (Optional) Output directory for converted text -d, --debug Enable verbose DEBUG logging
python main.py -i sample/
or
python main.py -i sample/ -o output/
python -m unittest
[sudo] apt-get install tesseract-ocr
- Install tesseract-ocr from UB Mannheim here:https://github.com/UB-Mannheim/tesseract/wiki
- Add the installed Tesseract-OCR directory path to
PATH
system variable
brew install tesseract
(Wikipedia page for Google | Lang : Simple English)
A man signing in at Google’s main afice, Googleplex.Google Inc. is an American multinational corporationthat is best known for running one of the largest searchengines on the World Wide Web (WWW). Every day,200 million (200,000,000) people use it. Google’s mainoffice (“Googleplex”) is in Mountain View, California,USA.With Google Search, people can also search for pictures,Usenet newsgroups, news, and things to buy online. ByJune 2004, Google had 4.28 billion web pages on itsdatabase, 880 million (880,000,000) pictures and 845million (845,000,000) Usenet messages — six billionthings.“To google,” as an action word (verb) means “to searchfor something on Google”. Because Google is so popular(more than half of people on the web use it) it has beenused to mean “to search the web”. Google dislikes thisuse since the name of the company is a trademark.As a public company, Google Inc. trades on theNASDAQ under the tickers GOOG and GOOGL.In August 2015, Google announced it was being restruc-tured under a new holding company called Alphabet Inc.1 HistoryGoogle was started in early 1996 by Larry Page andSergey Brin, two students at Stanford University, USA.It used to be called Backrub. Later, they made it into acompany, Google Inc., on September 7, 1998 at a friend’sgarage in Menlo Park, California. In February 1999, thecompany moved to 165 University Ave., Palo Alto, Cal-ifornia. Later that year, it moved to another place, nowcalled the “Googleplex”.In September 2001, Google’s rating system (“PageR-ank”, for saying which information is more helpful) got aUS. Patent. The patent was to Stanford University, withLawrence (Larry) Page as the inventor (the person whofirst had the idea).Google makes an important, though shrinking, percent-age of its money through its friends like America Onlineand InterActiveCorp. It has a special group known as thePartner Solutions Organization (PSO) which helps makecontracts, helps making accounts better, and gives engi-neering help.2 How Google makes moneyGoogle makes money by advertising. People or compa-nies who want people to buy their product, service, orideas give Google money, and Google shows an adver-tisement to people Google thinks will click on the adver-tisement. Google only gets money when people click onthe link, so it tries to know as much about people as pos-sible to only show the advertisement to the “right people”.It does this with Google Analytics, which sends data backto Google whenever someone visits a web site. From thisand other data, Google makes a profile about the person,which it then uses to figure out which advertisements toshow.3 The name “Google”The name “Google” is a misspelling of the wordg00g01.[7][8] Milton Sirotta, nephew of US. mathemati-cian Edward Kasner, made this word in 1938, for thenumber 1 followed by one hundred zeroes ( 10100 ). Itis said that the word “googol” was chosen as a name forthis number because it sounded like baby talk. Googleuses this word because the company wants to make lotsof stuff on the Web easy to find and use. Andy Bechtol-sheim first thought of the name.The name for Google’s main office, the “Googleplex,” is aplay on a different, even bigger number, the "googolpleX",which is 1 followed by one googol of zeroes.