B: Web Scraping Only a Specific Domain

Sunday, 8 September 2013

Web Scraping Only a Specific Domain

Web Scraping Only a Specific Domain

I am trying to make a web scrapper that, for this example, say scrapes
news articles from Reuters.com. I want to get the title and date. I know I
will ultimately just have to pull the source code from each address and
then parse the HTML using something like JSoup.
My question is: How do I ensure I do this for each news article on
Reuters.com? How do I know I have hit all the reuters.com addresses? Is
there any API's that can help me with this?
Thank you very much, Rich

B

Sunday, 8 September 2013

Web Scraping Only a Specific Domain

No comments:

Post a Comment