Well I know theres a lot of spiders and spider toolkits out there, but I want to write one that logs into a website (ie is cookie conscious and https conscious). And since this isnt the focus of my work I’d like it to be minimally complex. Does anyone know of any toolkits/libraries out there for this purpose? Preferred language: java
Have you looked into the heaton API package... I used it a while back when teaching Java Threads.
I'm not sure if you'll find them online but com.heaton.bot might fetch something. The library isn't the most elegant in semantics, but it's at least functional.
i came across a whole bunch of java based robots/web agents, but I wasnt looking exactly for a usual web crawler, I wanted to automatically log into a website basically and browse through it. which is what HTTPUnit is built for, though their motive is to make it part of JUnit for testing websites programmatically.