Code Snippet: Ruby Image Scraper
Posted by Ryan Baxter Tue, 14 Aug 2007 03:46:00 GMT
I stumbled upon a screen scraping library for Ruby last week called scrAPI. It’s extremely flexible and can be seen in action on the co.mments blog post scraper. The scrAPI library can be installed by issuing the following command from your console:
gem install scrapiTesting scrAPI was fairly easy once I figured out how to define a scraper. With that aside, I wrote a small script that saves images from a URL provided by the user. The scrAPI library could be used for good or evil, but only you can decide.
#!/usr/bin/ruby
require 'fileutils'
require 'open-uri'
require 'pathname'
require 'rubygems'
require 'scrapi'
# Get the URL input.
puts 'Enter a URL:'
url = gets.chomp
# Get the HTML source.
html = nil
open(url) {|source| html = source.read()}
# Define the scraper.
scraper = Scraper.define do
array :images
process "img", :images => "@src"
result :images
end
# Scrape the HTML for images.
images = scraper.scrape(html)
# Create a directory to save the images in.
directory = url.gsub(/http:\/\//, '')
FileUtils.mkdir directory
images.each do |image_path|
# Determine if image_path is absolute or relative.
path = Pathname.new(image_path)
if not path.relative? then image_path = url + image_path end
# Write the image to disk.
open(image_path) do |source|
file_name = image_path.split('/').last
open(directory + '/' + file_name, 'wb') {|file| file.write(source.read())}
end
end
puts 'Finished...'- Posted in Code Snippets
- Meta 2 comments, permalink, rss, atom
Comments
7 months later:
7 months later:

