Please take our Survey
logo       

Choosing A Webhost:
A web hosting service is a type of Internet hosting service that allows individuals and organizations to provide their own website accessible via the World Wide Web. Web hosts are companies that provide space on a server they own for use by their clients as well as providing Internet connectivity, typically in a data center. Web hosts can also provide data center space and connectivity to the Internet for servers they do not own to be located in their data center, called colocation. more...

RE: Screen scraping from VFP: msg#02177

db.foxpro.profox

Subject: RE: Screen scraping from VFP


> Wondered whether I should make this NF or not, but seeing how
> it'll be done from VFP, I figured "yeah, it's on topic."
>
> There's been talk recently and in the past about
> screen-scraping web pages. Does anyone have a "best
> practice" way of doing this?


I've wrestled with this one, to pull a list of RV sites in the US into
VFP.

At first it looked like a cakewalk, because name, address, tel, contact
info was arranged vertically on the (long) page separated by blank
lines. I think it was provided as one state per (long) page.

What I did was, a page/state at a time, copy the page to the clipboard
and then run the VFP process that read the clipboard and parsed it's
contents into individual records.

After about 100 problems, I finally got it to work, more or less,
because the data on these pages is not necessarily structured in the way
it appears to be. In the cases I ran with, sometimes there would be one
blank line separator, sometimes multiple blank lines. Fine, fix that,
then discover the data has never been validated, so it's incomplete and
contains missing/transposed fields, etc. basically required manual
cleaning afterwards.

And this was a case where the data appeared to be structured and
amenable to screen scraping.

I suppose the flip side is where data IS properly structured and
formatted/validated, perhaps by organizations intended to distribute
data this way, but then you'd think they would support other ways to get
it then screen scraping.


Bill



> tia,
> --Michael
>



_______________________________________________
Post Messages to: ProFox@xxxxxxxxx
Subscription Maintenance: http://leafe.com/mailman/listinfo/profox
OT-free version of this list: http://leafe.com/mailman/listinfo/profoxtech
** All postings, unless explicitly stated otherwise, are the opinions of the
author, and do not constitute legal or medical advice. This statement is added
to the messages for those lawyers who are too stupid to see the obvious.



<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

Recently Viewed:
hardware.arm.at...    cms.citadel.dev...    video.gstreamer...    java.facelets.u...    misc.basics.qna...    web.wiki.instik...    network.uip.use...    xdg.devel/2003-...    tex.bibtex.bibd...    finance.quotesp...    ietf.zeroconf/2...    redhat.blinux.g...    suse.db2/2003-0...    php.phpesp/2004...    uml.devel/2003-...    gnome.labyrinth...    qnx.openqnx.dev...    boot-loaders.gr...    db.dataperfect....    audio.audacity....    linux.uclinux.m...    editors.j.devel...    os.openbsd.tech...    kde.users.multi...   
Home | advertise | OSDir is an inevitable website. super tiny logo

Free Magazines

Cisco News
Receive a free quarterly e-newsletter with exclusive articles on how Cisco IT uses its own products and solutions to enable the business.
subscribe

Systems Management News, the newspaper for IT systems administration and data center managers! Each issue of Systems Management News is chock-full of news and analysis to help you understand what's happening in your field.
subscribe

The Enterprise Newsweekly eWeek is the essential technology information source for builders of e-business.
subscribe

Oracle Magazine Oracle Magazine contains technology strategy articles, sample code, tips, Oracle and partner news, how to articles for developers and DBAs, and more. Oracle (NASDAQ: ORCL) is the world's largest enterprise software company.
subscribe

Total Telecom Total Telecom is "The Economist of the communications industry".
subscribe

Navigation