Once page content is scraped, it is now available on our own website, meaning we can pull and modify the content using AJAX (which we do on scrape.html above).
You’ll notice that we’ve made the scrape.php (scrape.txt above) to require a query: “q”. When we send along this “q” (a URL) at the end of the URL on scrape.php, it will scrape any page we want from the Internets. So, for example, we wanted to scrape the content from http://www.pfcompanion.com/spells/cleric-spells.html, so we added ?q=http://www.pfcompanion.com/spells/cleric-spells.html to the end of our URL at scrape.php:
These last two sessions (with a Spring Break in the middle), our group has looked at setting up a Raspberry Pi as a server and we are still working on connection issues with Cloud9 IDE.
For the Raspberry Pi server, we had a fun string of events. First, we had to find power for the thing (no power cord). We used the mini USB port onboard and connected that to a powered USB port on a computer. Problem solved. After connecting an ethernet wire, a Mac mouse/keyboard combo, and an HDMI cord, we realized that no monitors or Macs in the area had HDMI input. What a crock! After scrounging around in the office for a usable monitor, and even trying an old Panasonic tv (the room looked like a Texas Instruments developer’s lab from the 80s before we were done), we finally found a large TV with HDMI, and used that.
We promptly installed Raspbian OS (a Linux distro for Raspberry Pi) on an SD card that the Raspberry Pi uses as a hard drive, and we were pleasantly impressed with the results. The X desktop UI is nice, if a tiny bit slow on the Raspberry Pi. Next, we installed apache with the sudo command in the terminal (we tried to use an old instructables article to install this, but it ended up being too old to work) :
sudo apt-get update
sudo apt-get upgrade
sudo apt-get install apache2
After doing this, we were good to go, and our server was up and running smoothly on 127.0.1.1 (Hurray!) These commands update the system, then upgrade, then install apache2 (just in case you can’t read code). Next week we’ll likely try to get this server out to the world.
The age of the machines has come, and we as mere humans want to understand our overlords better. This week, in celebration of the World Wide Web’s 25th anniversary, we have inaugurated an informal weekly get-together at University of Missouri Columbia Digital Media Zone around the topic of the machines that make the Web work: servers.
This blog (and possibly others) will serve as the log of our adventures into all things server-related. Our weekly meetings, Thursdays 6pm-7:30pm in the Digital Media Zone in Townsend Hall, are going to be short hackathons on the following topics:
Making our own servers (with Raspberry Pi, Android, Intel Galileo and such)
Working with our own Apache server space (free unlimited space if you join the club, which is also free)
Using JS server-side (Node.js)
MySQL, PHPMyAdmin, MongoDB, CouchDB, PouchDB, LocalStorage syncing with a RESTful API
What makes blogs work on the server side
HTTP and RESTful goodness
Other useful acronyms
Scraping and visualizing data
General server hackery
If you are interested in any (or all) of the above, or in any other topic remotely server-centric, we’d love to see you at the weekly meeting in the Zone. We want this to be a learning time for all, and to be a place to safely explore our semi-sentient neighbors who live at the farm (otherwise known as the data center). We would also like to start hosting Google hangouts during our meetings so that all of you who don’t actually exist in Columbia Missouri can pretend like you do.
Our first meeting of fellow hackers went quite well, and everybody received their own server space on wadholm.com, a WordPress.org blog (if they wanted one) and an introduction to the Cloud9 IDE (http://c9.io). Cloud9 is an awesome (and beautifully elegant) development environment, and has a great option to connect via FTP (set up an account, then go to your dashboard, then click on Create New Workspace>FTP>enter your credentials to a non-SFTP connection: so Bengal spaces can’t be connected to but wadholm.com spaces can, and then wait a bit–you may need to refresh the page, then click on your new workspace, and click on Start Editing). Next week we’ll dive into the two projects that we were all interested in first: Creating our own servers, and working with a database to Create, Read, Update, and Delete stuff (and all that CRUD). Hope to see you there. But not all of you. At least not if the whole world is reading this.