talktome.pl - a web robot to fetch pages just for the sake of it
The aim of this web robot is to imitate some of the behaviour of a human browser, so that the people who want to index everything I look at will have plenty of data to store about me.
It doesn't have to be very good at imitating me, because they aren't allowed to find out which page on a site I request .. but at this point, things get more complicated.
These are just the questions that popped into my head while I was writing about the program. There will be more info somewhere on http://www.t8o.org/~mca1001/ .
I've signed it with my GPG key.
You can run the signed copy, there's no need to extract it. You just need to check that nobody stuck anything on the top (before the start of the signature) or put extra PGP data blocks in here somewhere.
The file should start
#! /usr/bin/perl -Tw ' -----BEGIN PGP SIGNED MESSAGE-----
It is quite possible that the difference of line endings will cause the signature to fail under Windows or MacOS, if you've downloaded the file as text. I haven't tested this.
The original plan was to compile most of the code inside a Safe module compartment, which promises you that certain operations cannot be performed at runtime.
At the moment, this has to take a back seat to getting some sort of ``proof of concept'' program running.
Anyway, despite the fact that there is no warranty (and there never will be, either), I will do my best to make sure this program doesn't do anything stupid to or with your computer. That's the best you can get at the moment - along with access to the source code of course.
Well it depends whether you think there's any value in what it does. Perhaps some comparisons will help clear things up?
The emacs command ``M-x spook'' adds a couple of lines of subversive junk text to a file, usually to the bottom of each email a person writes. This is also wasteful, eg.
AFSPC argus threat cryptographic Compsec global military AGT. AMME explosion CDMA Kosovo ASDIC anarchy ARPA CipherTAC-2000
It's just there to attract unwelcome attention.
Compare also streaming audio. The internet isn't (currently) designed as a broadcast medium.
They could exclude the hits on the grounds that they come from a self-confessed robot. They should be careful not to exclude real humans pretending to be robots, though.
It's probably very easy, even for a computer, if you have access to the full captured data of a session.
However, AIUI the 2002 budget in the UK only allows for serious snooping on 1 in 10,000 users. For the rest, the data will just say how much data came and went, and when.
That depends how careful the human is. The robot will be essentially random, within some set of boundaries.
If the human doesn't cause the total behaviour of the system to stray outside what is statistically feasible, it could be very hard to tell the difference unless you have something that can infer purpose from something that's being careful to stay mostly inside one standard deviation.
I'm not helping them. The terrorists are quite capable of doing whatever they like - remember, they're not bound by the law the way innocent citizens are. It makes them much harder to track.
The current plan is to make a web crawler that obeys all the current rules on robot exclusion, and have it fetch a small quantity of data from a selection of web servers. It will identify itself, it will fetch pictures and other things embedded in HTML files, it will follow links and it will probably need to make a few queries to search engines too.
These are all the operations that a person does when surfing the net - except people don't obey robots.txt files and they don't usually surf 24/7!
These aren't details, I know. I haven't written the program yet.
It is my aim to swap a small quantity of random and pointless data with as many computers as possible, on a regular basis.
In theory my 128k cable modem can shift about one gigabyte of data per day. Most home broadband ISPs will start getting upset when one uses more than about 2% (50:1) of the available bandwidth all the time, so perhaps I shall aim for one tenth of this limit. That still leaves plenty bandwidth for my personal use - I'm not a heavy data muncher.
That's about 2.6 megabytes per day, to or from my machine, to some other places.
Divided between 1000 computers, this is under 3kb each. It's not enough to give anyone cause to be grumpy, except the nosey folk who wish to record its passing and sit on the data for seven years.
It is, however, enough to send a fairly substantial email. The data doesn't have to be ignored at the other end, if someone is expecting it.
There is no code yet.
(c) 2002 Matthew Astley
$Id: talktome.pl,v 1.12 2003/01/19 18:56:30 mca1001 Exp $