A system of automated website classification was developed for a client needing to automatically decide whether specific websites were appropriate places to advertise.
The analysis first required a generalized way to look for template-based websites. A meta-language was developed to do high-level pattern matching on the structure of websites.
Classification was then done using latent semantic analysis on specific parsed sections of matching websites.