|
|
Choosing A Webhost: |
RE: Screen scraping from VFP: msg#02177db.foxpro.profox
> Wondered whether I should make this NF or not, but seeing how > it'll be done from VFP, I figured "yeah, it's on topic." > > There's been talk recently and in the past about > screen-scraping web pages. Does anyone have a "best > practice" way of doing this? I've wrestled with this one, to pull a list of RV sites in the US into VFP. At first it looked like a cakewalk, because name, address, tel, contact info was arranged vertically on the (long) page separated by blank lines. I think it was provided as one state per (long) page. What I did was, a page/state at a time, copy the page to the clipboard and then run the VFP process that read the clipboard and parsed it's contents into individual records. After about 100 problems, I finally got it to work, more or less, because the data on these pages is not necessarily structured in the way it appears to be. In the cases I ran with, sometimes there would be one blank line separator, sometimes multiple blank lines. Fine, fix that, then discover the data has never been validated, so it's incomplete and contains missing/transposed fields, etc. basically required manual cleaning afterwards. And this was a case where the data appeared to be structured and amenable to screen scraping. I suppose the flip side is where data IS properly structured and formatted/validated, perhaps by organizations intended to distribute data this way, but then you'd think they would support other ways to get it then screen scraping. Bill > tia, > --Michael > _______________________________________________ Post Messages to: ProFox@xxxxxxxxx Subscription Maintenance: http://leafe.com/mailman/listinfo/profox OT-free version of this list: http://leafe.com/mailman/listinfo/profoxtech ** All postings, unless explicitly stated otherwise, are the opinions of the author, and do not constitute legal or medical advice. This statement is added to the messages for those lawyers who are too stupid to see the obvious.
|
|
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| Previous by Date: | Re: [OT] the ME crisis, a Jungian perspective., petetheisen |
|---|---|
| Next by Date: | Re: VFP Keyword Jeopardy (37), Mike Stewart |
| Previous by Thread: | Screen scraping from VFP, vfpmcp |
| Next by Thread: | Re: Screen scraping from VFP, Ed Leafe |
| Indexes: | [Date] [Thread] [Top] [All Lists] |
Free MagazinesCisco NewsReceive a free quarterly e-newsletter with exclusive articles on how Cisco IT uses its own products and solutions to enable the business. subscribe Systems Management News, the newspaper for IT systems administration and data center managers! Each issue of Systems Management News is chock-full of news and analysis to help you understand what's happening in your field. subscribe The Enterprise Newsweekly eWeek is the essential technology information source for builders of e-business. subscribe Oracle Magazine Oracle Magazine contains technology strategy articles, sample code, tips, Oracle and partner news, how to articles for developers and DBAs, and more. Oracle (NASDAQ: ORCL) is the world's largest enterprise software company. subscribe Total Telecom Total Telecom is "The Economist of the communications industry". subscribe |