[PROGRAMMING] Writing a spider

Well I know theres a lot of spiders and spider toolkits out there, but I want to write one that logs into a website (ie is cookie conscious and https conscious). And since this isnt the focus of my work I’d like it to be minimally complex. Does anyone know of any toolkits/libraries out there for this purpose? Preferred language: java

Graci

Re: [PROGRAMMING] Writing a spider

Itsy Bitsy Spider Climbed up the Water Spout....

Re: [PROGRAMMING] Writing a spider

thanks..that helps..

Re: [PROGRAMMING] Writing a spider

Ravage, naraz na hoo bhai....I am just joshing.

Re: [PROGRAMMING] Writing a spider

nahi hua, i was joshing too..

someone please tell me about the spidering.. i dont wanna learn how to handle cookies at this age..

i love this new site.. have they used AJAX?

Re: [PROGRAMMING] Writing a spider

I think I better find what the hell spider is first...:-)

Re: [PROGRAMMING] Writing a spider

heheh.. crawler.. spider..web agent..worm..call it what ya want..

any good API for manipulating html pages, http/https connections and cookies (on the client side) will do fine..

Re: [PROGRAMMING] Writing a spider

only a clue..
look for python or perl
these ppl do lots of such things.

Re: [PROGRAMMING] Writing a spider

fyi.. HttpUnit from http://httpunit.sourceforge.net is the most elegant solution for this purpose that I’ve seen.

Re: [PROGRAMMING] Writing a spider

Have you looked into the heaton API package... I used it a while back when teaching Java Threads.

I'm not sure if you'll find them online but com.heaton.bot might fetch something. The library isn't the most elegant in semantics, but it's at least functional.

Re: [PROGRAMMING] Writing a spider

i came across a whole bunch of java based robots/web agents, but I wasnt looking exactly for a usual web crawler, I wanted to automatically log into a website basically and browse through it. which is what HTTPUnit is built for, though their motive is to make it part of JUnit for testing websites programmatically.