Scraping Stuff

http://www.hack.invariable.org/scrape.txt

http://www.hack.invariable.org/scrape.html

Here is some site scraping for your php/html enjoyment. The first link is a .txt version that you can just rename .php. The second is the page that shows the content (right click to view source).

This last week in the Hacker’s Club we worked on scraping some content from another web page in php using curl:

http://www.php.net/manual/en/curl.examples.php

Once page content is scraped, it is now available on our own website, meaning we can pull and modify the content using AJAX (which we do on scrape.html above).

You’ll notice that we’ve made the scrape.php (scrape.txt above) to require a query: “q”. When we send along this “q” (a URL) at the end of the URL on scrape.php, it will scrape any page we want from the Internets. So, for example, we wanted to scrape the content from http://www.pfcompanion.com/spells/cleric-spells.html, so we added ?q=http://www.pfcompanion.com/spells/cleric-spells.html to the end of our URL at scrape.php:

http://www.hack.invariable.org/scrape.php?q=http://www.pfcompanion.com/spells/cleric-spells.html

Now we get the page we want, and on scrape.html, we can pull just the content we want, line by line, or cell by cell. Using JavaScript we can put the information into any format we want on the fly.

Mizzou Server Hackerspace

4 out of 5 Moms recommend this hackerspace

Scraping Stuff

Tell your side of the story... Cancel reply

Share this:

Related

Tell your side of the story... Cancel reply