Scraping Stuff

http://www.hack.invariable.org/scrape.txt

http://www.hack.invariable.org/scrape.html

Here is some site scraping for your php/html enjoyment. The first link is a .txt version that you can just rename .php. The second is the page that shows the content (right click to view source).

This last week in the Hacker’s Club we worked on scraping some content from another web page in php using curl:

http://www.php.net/manual/en/curl.examples.php

Once page content is scraped, it is now available on our own website, meaning we can pull and modify the content using AJAX (which we do on scrape.html above).

You’ll notice that we’ve made the scrape.php (scrape.txt above) to require a query: “q”. When we send along this “q” (a URL) at the end of the URL on scrape.php, it will scrape any page we want from the Internets. So, for example, we wanted to scrape the content from http://www.pfcompanion.com/spells/cleric-spells.html, so we added ?q=http://www.pfcompanion.com/spells/cleric-spells.html to the end of our URL at scrape.php:

http://www.hack.invariable.org/scrape.php?q=http://www.pfcompanion.com/spells/cleric-spells.html

Now we get the page we want, and on scrape.html, we can pull just the content we want, line by line, or cell by cell. Using JavaScript we can put the information into any format we want on the fly.

Advertisements

SQLing It Up

SQL databases are a standard for many Web applications and content management systems. This week the server hackers took a stab at connecting to and reading a MySQL database using PHP:

http://www.hack.invariable.org/index.txt

Check out the code we worked on and let us know if you have questions. Next week we may try some updating, reading and deleting on the database.

 

Making Pi

Photo from Wikimedia by cowjuice, http://commons.wikimedia.org/wiki/File:Raspberry_Pi_Photo.jpg?uselang=en

Raspberry Pi: inedible, but oh so sweet

These last two sessions (with a Spring Break in the middle), our group has looked at setting up a Raspberry Pi as a server and we are still working on connection issues with Cloud9 IDE.

For the Raspberry Pi server, we had a fun string of events. First, we had to find power for the thing (no power cord). We used the mini USB port onboard and connected that to a powered USB port on a computer. Problem solved. After connecting an ethernet wire, a Mac mouse/keyboard combo, and an HDMI cord, we realized that no monitors or Macs in the area had HDMI input. What a crock! After scrounging around in the office for a usable monitor, and even trying an old Panasonic tv (the room looked like a Texas Instruments developer’s lab from the 80s before we were done), we finally found a large TV with HDMI, and used that.

We promptly installed Raspbian OS (a Linux distro for Raspberry Pi) on an SD card that the Raspberry Pi uses as a hard drive, and we were pleasantly impressed with the results. The X desktop UI is nice, if a tiny bit slow on the Raspberry Pi. Next, we installed apache with the sudo command in the terminal (we tried to use an old instructables article to install this, but it ended up being too old to work) :

sudo apt-get update

And then:

sudo apt-get upgrade

And then:

sudo apt-get install apache2

After doing this, we were good to go, and our server was up and running smoothly on 127.0.1.1 (Hurray!) These commands update the system, then upgrade, then install apache2 (just in case you can’t read code). Next week we’ll likely try to get this server out to the world.

The Rise of the Machines

Old-timey machine shop from http://commons.wikimedia.org/wiki/File:Machine-shop-r.jpg

The age of the machines has come, and we as mere humans want to understand our overlords better. This week, in celebration of the World Wide Web’s 25th anniversary, we have inaugurated an informal weekly get-together at University of Missouri Columbia Digital Media Zone around the topic of the machines that make the Web work: servers.

This blog (and possibly others) will serve as the log of our adventures into all things server-related. Our weekly meetings, Thursdays 6pm-7:30pm in the Digital Media Zone in Townsend Hall, are going to be short hackathons on the following topics:

  • Making our own servers (with Raspberry Pi, Android, Intel Galileo and such)
  • Working with our own Apache server space (free unlimited space if you join the club, which is also free)
  • Using JS server-side (Node.js)
  • MySQL, PHPMyAdmin, MongoDB, CouchDB, PouchDB, LocalStorage syncing with a RESTful API
  • PHP
  • What makes blogs work on the server side
  • HTTP and RESTful goodness
  • Other useful acronyms
  • Scraping and visualizing data
  • General server hackery

If you are interested in any (or all) of the above, or in any other topic remotely server-centric, we’d love to see you at the weekly meeting in the Zone. We want this to be a learning time for all, and to be a place to safely explore our semi-sentient neighbors who live at the farm (otherwise known as the data center). We would also like to start hosting Google hangouts during our meetings so that all of you who don’t actually exist in Columbia Missouri can pretend like you do.

Our first meeting of fellow hackers went quite well, and everybody received their own server space on wadholm.com, a WordPress.org blog (if they wanted one) and an introduction to the Cloud9 IDE (http://c9.io). Cloud9 is an awesome (and beautifully elegant) development environment, and has a great option to connect via FTP (set up an account, then go to your dashboard, then click on Create New Workspace>FTP>enter your credentials to a non-SFTP connection: so Bengal spaces can’t be connected to but wadholm.com spaces can, and then wait a bit–you may need to refresh the page, then click on your new workspace, and click on Start Editing). Next week we’ll dive into the two projects that we were all interested in first: Creating our own servers, and working with a database to Create, Read, Update, and Delete stuff (and all that CRUD). Hope to see you there. But not all of you. At least not if the whole world is reading this.