Jump to content
  • Member Statistics

    17,515
    Total Members
    7,904
    Most Online
    amirah5
    Newest Member
    amirah5
    Joined

Need to Grab Lots of Data


Recommended Posts

SoI'll preface this by saying I don't have very good programming skills, but if it's explained well enough, I might be able to walk myself through it.

I want to "scrape" about 10 years of upper air data for a number of stations and dump just a handful of parameters into an Excel spreadsheet. I have about 20 stations to target and I just want 850 mb T and 500 mb height.

I can get the upper air data either off of U of Wyoming or Plymouth State.

http://vortex.plymou...asoundlist.html

http://weather.uwyo....r/sounding.html

Plymouth I can only get one day at a time. Wyoming I can get one month at a time. Still, it's a manual effort to go day by day and input all of the parameters into a spreadsheet. I can do maybe one year per station per day.

Since the output html is identical for both websites...

Wyoming for instance is: http://weather.uwyo.edu/cgi-bin/sounding?region=naconf&TYPE=TEXT%3ALIST&YEAR=YYYY&MONTH=MM&FROM=DDHH&TO=DDHH&STNM=STATN

The bolded items are the only ones that change.... Year, month, date/hour (I just want 00z), and station #.

Since it's a similar web address, I'm assuming there could be a way to draw something up to maybe make a Macro in Excel to go out and fetch data from these websites in sequential order. I just have no idea how one would write this. I could fetch all the data, then do a sort and voila.

I guess alternatively I could just export all the WYoming data by month...month by month manually.. into Excel and sort, which *could* work. But this seems more fun. If anyone has any ideas, I'd appreciate it. Thanks.

Link to comment
Share on other sites

SoI'll preface this by saying I don't have very good programming skills, but if it's explained well enough, I might be able to walk myself through it.

I want to "scrape" about 10 years of upper air data for a number of stations and dump just a handful of parameters into an Excel spreadsheet. I have about 20 stations to target and I just want 850 mb T and 500 mb height.

I can get the upper air data either off of U of Wyoming or Plymouth State.

http://vortex.plymou...asoundlist.html

http://weather.uwyo....r/sounding.html

Plymouth I can only get one day at a time. Wyoming I can get one month at a time. Still, it's a manual effort to go day by day and input all of the parameters into a spreadsheet. I can do maybe one year per station per day.

Since the output html is identical for both websites...

Wyoming for instance is: http://weather.uwyo.edu/cgi-bin/sounding?region=naconf&TYPE=TEXT%3ALIST&YEAR=YYYY&MONTH=MM&FROM=DDHH&TO=DDHH&STNM=STATN

The bolded items are the only ones that change.... Year, month, date/hour (I just want 00z), and station #.

Since it's a similar web address, I'm assuming there could be a way to draw something up to maybe make a Macro in Excel to go out and fetch data from these websites in sequential order. I just have no idea how one would write this. I could fetch all the data, then do a sort and voila.

I guess alternatively I could just export all the WYoming data by month...month by month manually.. into Excel and sort, which *could* work. But this seems more fun. If anyone has any ideas, I'd appreciate it. Thanks.

WGET and batch files... or a quick custom program... would work better than Excel. Getting the HTML into Excel is another issue altogether.

Link to comment
Share on other sites

SoI'll preface this by saying I don't have very good programming skills, but if it's explained well enough, I might be able to walk myself through it.

I want to "scrape" about 10 years of upper air data for a number of stations and dump just a handful of parameters into an Excel spreadsheet. I have about 20 stations to target and I just want 850 mb T and 500 mb height.

I can get the upper air data either off of U of Wyoming or Plymouth State.

http://vortex.plymou...asoundlist.html

http://weather.uwyo....r/sounding.html

Plymouth I can only get one day at a time. Wyoming I can get one month at a time. Still, it's a manual effort to go day by day and input all of the parameters into a spreadsheet. I can do maybe one year per station per day.

Since the output html is identical for both websites...

Wyoming for instance is: http://weather.uwyo.edu/cgi-bin/sounding?region=naconf&TYPE=TEXT%3ALIST&YEAR=YYYY&MONTH=MM&FROM=DDHH&TO=DDHH&STNM=STATN

The bolded items are the only ones that change.... Year, month, date/hour (I just want 00z), and station #.

Since it's a similar web address, I'm assuming there could be a way to draw something up to maybe make a Macro in Excel to go out and fetch data from these websites in sequential order. I just have no idea how one would write this. I could fetch all the data, then do a sort and voila.

I guess alternatively I could just export all the WYoming data by month...month by month manually.. into Excel and sort, which *could* work. But this seems more fun. If anyone has any ideas, I'd appreciate it. Thanks.

Windows? What programming languages you are more comfortable with, or at least have some experience? What are the uses for the pulled data?

Table parsing can be a bit hair pulling, but the good news is that it's a fixed width format. If you have done vbscripting (for macro programming) you can check the WinHttpRequest object, which will help you pull the data from an URL.

Link to comment
Share on other sites

Can you explain exactly which pages you want to scrape from and which fields?

As long as the list of pages can be formulated and the data is standardized on them I could probably do this pretty easily. I'd scrape with linux command line tools (lynx, awk, sed) and import to a mysql db, then use php excel output from mysql to export to your desired format.

Need to know exactly what data to collect and from what pages.

Link to comment
Share on other sites

Can you explain exactly which pages you want to scrape from and which fields?

As long as the list of pages can be formulated and the data is standardized on them I could probably do this pretty easily. I'd scrape with linux command line tools (lynx, awk, sed) and import to a mysql db, then use php excel output from mysql to export to your desired format.

Need to know exactly what data to collect and from what pages.

You can probably skip mysql and php excel if you save it as a csv file.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...