
YiJing::0x11 - Gu\/ (Web Crawler)

"Beware: There is only a thin line between a crawler and a worm!"
è ±ãå ã亨ã婿¶å¤§å·ãå ç²ä¸æ¥ãåç²ä¸æ¥ã
Web crawler: Nice and fun. Suitable to sift in the great Data Flow. Test run for three days before sending it out; analyze the data for three days before sending it out again.
彿°ãè ±ãåä¸èæä¸ãå·½èæ¢ã
è ±ãå
亨ãè天䏿²»ä¹ã婿¶å¤§å·ã徿äºä¹ã
å
ç²ä¸æ¥ãåç²ä¸æ¥ãçµåæå§ã天è¡ä¹ã
This hexagram is emblematic of the trouble that you would face in writing or managing a web crawler: the program has to strike out on its own and unobtrusively sift great gobs of data in any number of messy formats.
It needs testing and retesting, planning and monitoring. It has to follow old standards and accept new ones, and tolerate sites that don't follow standards at all. To do its work, the web crawler has to get as much data from as many sites as it can, without bothering any webmasters in the process.
It has to be efficient, but deliberate. It is a matter of contradictory goals -- a situation that comes up in all sorts of systems besides web crawlers.
象æ°ã山䏿颍ãè ±ãååä»¥æ¯æ°è²å¾·ã
Gathering Data under Standards, is the Image of a Web Crawler. A wise hacker makes careful use of it to provide people with interesting information while maintaining the proper ethics.
åå
ãå¹²ç¶ä¹è ±ãæåãèç¡åãå²çµåã
象æ°ãå¹²ç¶ä¹è ±ãææ¿èä¹ã
Crawling and the Data.
... The web crawler will harvest some bad data. Make sure it can recover well and move on correctly.
ä¹äºãå¹²æ¯ä¹è ±ãä¸å¯è²ã
象æ°ãå¹²æ¯ä¹è ±ãå¾ä¸éä¹ã
Crawling and the Network.
... The web crawler should back off from network trouble, wait, compromise, and improvise.
ä¹ä¸ãå¹²ç¶å°ææ¦ãç¡å¤§åã
象æ°ãå¹²ç¶ä¹è ±ãçµç¡åä¹ã
Fine tuning.
... You'll have to fix some mistakes that shouldn't have been made, but it's no big deal.
å
åãè£ç¶ä¹è ±ãå¾è¦åã
象æ°ãè£ç¶ä¹è ±ã徿ªå¾ä¹ã
Obvious and oblivious.
... The implementation is nice and simple, and dangerously wrong. Watch it upset everyone!
å
äºãå¹²ç¶ä¹è ±ãç¨è½ã
象æ°ãå¹²ç¶ä¹è ±ãæ¿ä»¥å¾·ä¹ã
Public attention.
... Make it clear that you're listening to what people say. The web crawler depends on the kindness of strangers.
ä¸ä¹ãä¸äºç侯ãé«å°å
¶äºã
象æ°ãä¸äºç侯ãå¿å¯åä¹ã
Stepping back.
... You should act on principle depite the authority's demands, so that you can serve a higher goal.