
I've been doing a lot of work with complex search queries on Google lately. I have a script that grabs all the URLs returned for a given query. The problem is that although there are hundreds of thousands of results for most queries, Google will give you this notice if you try to get more than the first 1,000 results: "Sorry, Google does not serve more than 1000 results for any query."
This restriction became a problem for me. After doing some research, I ran across an interesting workaround that allows you to access twice as many results. Simply include and remove a common word in your queries. For example, if you are searching for proxy and want to get 2,000 results, use these two queries:
Now you have 2,000 unique results.
You can extend this idea to get 1000 * 2n results where n is the number of words you include/remove. Here is how you could get 8,000 results:
proxy +"the" +"to" +"that"proxy +"the" +"to" -"that"proxy +"the" -"to" +"that"proxy +"the" -"to" -"that"proxy -"the" +"to" +"that"proxy -"the" +"to" -"that"proxy -"the" -"to" +"that"proxy -"the" -"to" -"that"
Using common works like "the", "to" and "that" might not be the best tactic for all queries. For example, proxy -"the" -"to" -"that" will return almost entirely non-English pages, and that might not be what you are aiming for. A better approach would be to use words that are specific to your query. For proxy it might be best to use these words: "myspace", "copyright", "nph-proxy" and "anonymous".
I wrote a function in PHP that takes a list of words and query as input and outputs an array of queries.
function make_queries($query, $common_terms) { $queries = array($query); foreach($common_terms as $term) { $new_queries = array(); foreach($queries as $query) { $new_queries[] = $query.' +"'.$term.'"'; $new_queries[] = $query.' -"'.$term.'"'; } $queries = $new_queries; } return $queries; }
Example usage:
$common_terms = array('myspace', 'copyright', 'anonymous');
$queries = make_queries('proxy', $common_terms);
print_r($queries);
Be mindful of how many "common terms" you are using. Using 7 terms results in 128 different queries.
Also, I recommend including &filter=0 in then URL so that Google doesn't filter out results that it thinks you don't want. You can do the filtering yourself.
Another hint is to use "site:.com", "site:.net" OR "site:.org" as one of your common terms. Don't use more than one because a query with +site:.com +site:.net will not return any results. Although, if you manually add -site:.com -site:.net -site:.org you will get primarily foreign country domains.
Happy Googling!

Digg
Facebook
Flickr
Last.fm
MySpace
Wikipedia
YouTube
Entries (RSS)
August 21st, 2007 at 4:43 pm
cool trick! thanks
September 24th, 2007 at 1:28 am
Oh, very silly!
April 8th, 2008 at 10:02 am
Wow very interesting!! Nice work!! We are looking to do 20,000 or so listings from Google, would you be interested in possibly helping us out with this? Contact me by email if so. Thx