Examiner.com writers can save their work using this screen-scraper script written for the purpose
I've been writing for Examiner.com for over 7 years, and with the news that they're going to shut down I needed to retrieve over 540 articles to repost them on my own website. Lesson learned - it's better to own your own platform than to write for someone elses platform. Anyway, the result is a Node.js script I'm calling articlescraper. The purpose is to traverse an index page that might be split over multiple pages, then extract the articles from the pages linked from the index.
That is - the stereotypical blog website has pages listing article teasers in reverse chronological order. The index pages will have URL's like
http://example.com/path/to/index?page=##. The scripts request all the ?page=## pages, saving the data for each teaser in a YAML file.
The second script reads that YAML file, and retrieves each article. It saves article data in another YAML file, and also detects all images downloading them.
The documentation shows how to configure the scripts for Examiner.com. However, they should be useful for other website technology as well. I have a Blogger blog that I want to convert to something else, perhaps my Wordpress blog and it should be easy to accomplish by adjusting the selectors in the configuration file.
NOTE FOR EXAMINER AUTHORS: We own the copyright to our work posted on Examiner. Examiner never asserted ownership over our articles. That means we are fully within our rights to download our articles and resurrect them elsewhere.
Every so often a service will suddenly go belly-up leaving the users of that service scrambling to save their files. At least one photo-sharing site did this, and in the process vaporized a bunch of pictures taken by people. In some cases those were the only copies of those pictures. Think of the proud father whose only picture of his child disappeared in a puff of smoke just because a business suddenly vaporized. That's what is happening to us on Examiner.com right now.
You have until July 10 to retrieve your articles, or they'll be gone forever.