Arale, a Java web spider
Arale can download entire web sites or specific resources from the web. Arale can also render dynamic sites to static pages. I wrote this utility in 2001 to familiarize myself with the
java.net.* package. I’m not actively maintaining it anymore, the code is rather messy but the spider is working fine.
Areas of interest
- Web development
- Advanced web browsing
- Download and scan user-defined file types.
- Rename dynamic resources. Encode query strings into filenames.
- Set the number of simultaneous connections.
- Options for minimum and maximum file size.
- Domain depth support.
While many bots around are focused on page indexing, Arale is primarly designed for personal use. It fits the needs of advanced web surfers and web developers. Some real life cases are:
- downloading only images, videos, mp3 or zip files from a site.
- manuals, articles, ebooks fragmented in many files to discourage download.
- user-unfriendly sites. Popups, banners and tricky scripts annoying you before you can download a resource.
Multithreaded means that Arale can download more than one file simultaneously. Arale can easily saturate your bandwidth, thus providing the fastest possible download speed for your internet connection.
If you’re developing dynamic sites using technologies such as JSP, PHP, ASP or whatever, you may be interested in rendering dynamic pages to static files.
Arale supports URL renaming: query string is encoded in the static filename and .html extension is appended. let’s make an example:
- original URL:
- static filename:
Existing links to renamed URLs are substituted with modified links. This preserves navigation among static files. Once a dynamic site is trasformed into a set of static files it can be deployed on a server that does not support dynamic pages. For example you may deploy a JSP site in a free web space.
Currently Arale is a command-line tool. It would be nice to develop a GUI for it. I’d like to have some feedback from users, so if you think it’s worth send me an email and tell me what you think. ;)