Jump to content
  • Member Statistics

    17,507
    Total Members
    7,904
    Most Online
    SnowHabit
    Newest Member
    SnowHabit
    Joined

Weather Service stops receiving satellite data


Recommended Posts

For all interested... while I don't know the source of the problem, having worked on this data directly (note that the GOES Soundings products are one of many mentioned as missing... I personally put these into Operations at NESDIS... not blowing my horn, I'm just trying to let you know that I'm fairly familiar with the operations there). Given that I don't work there anymore AND don't know the precise problem, I can't speak with 100% certainty. But I can give you some better idea than the b.s. I see floating around (like, up-thread, apparently AccuWeather stating that this has no impact on the ECMWF... nothing could be further from the truth). So, let me point out a few things (again, I'm not 100%, absolutely certain of these, since things might've changed since I've been there; but, I'd put money on it that I'm at least in the ballpark on most of these items)...

1) Whatever failed is massive. This never happens. I've met my share of govt workers who meet the negative stereotypes that some anti-govt political types have. But they're the exception. And most sys admin types are VERY on the ball. Even our R&D systems in NESDIS never went down for this long. For a segment of NESDIS operations to crash for DAYS... this is a big deal. Unheard of.

It wasn't a crash, but it was a very unfortunate impactive event. That's all I want to say about that.

2) However... raw data is not impacted. The satellite ingest comes in at Wallops and feeds to NESDIS. Moreover, NESDIS' SSD server is up and functioning properly. So, have no fear about permanent data loss. For example, we've been bemoaning this failure over in the Eurasian snow cover thread, as the National Ice Center is one site that's down. Well, have no fear. This outage is VERY frustrating (believe me, I know... I'm one of few who focuses on mid-Oct snow... not snow increase across the whole month; so, for me, this is the most critical time... this outage is killing me!), but all the raw data is there and recoverable. NatIce (and OSPO, whose SST site is also down) should be able to recover and reprocess all data once they're back up and running.

3) I'm not going to crucify AccuWeather, as I didn't see their comment firsthand. But let me make one thing clear... this data feed failure will impact the ECMWF model MORE than the GFS, not less. NCEP is concerned about the poor (relative to non-remotely sensed data) signal-to-noise ratio of satellite data and gives it low weight in the NWP models. The ECMWF takes a different view... while they recognize the limitations of the data, they believe it to be better than nothing in data void regions (oceans). They are MUCH more robust in utilizing satellite data... especially the derived products, which are failing to be delivered. If anyone tells you the EC is not impacted, they don't know what they're talking about. It is, in fact, impacted MORE than the NCEP models.

I won't comment on the actual model functionality, but data-wise, the EC and CMC and others possibly lost more data than NCEP, as NCEP has a direct link/circuit to NSOF that was unaffected. The outage to NCEP wasn't due directly to the NESDIS event, as when it happened at 20Z on the 20th, we lost a few types but were mostly fine. Then 23 hours later, we lost connectivity to two of their systems, a side effect from the major outage affecting everyone else. It was a tricky issue and took a day to troubleshoot and correct by NESDIS system/network engineers.

4) However, to clarify, I should possibly have said, WERE being impacted. According to NWS notifications today, there is a backup, "prehistoric" feed now in place to NCEP (and probably to the EC). There was also an emergency network switch replacement performed this morning, which I suspect may be related to the problem (though, I could be wrong, as the NatIce and OSPO sites are still down). So, I THINK any impacts on NWP are in the past, and things are rolling forward. That said, as I noted in the parenthesis, the NatIce and OSPO sites are still down. So, clearly, the issues are not completely resolved.

Things at NCEP are pretty much back to normal, and have been since around 00Z on 10/23, so I'm not sure what you mean about a prehistoric feed in place. What notifications did you see?

5) On that note, and referencing my first point on this being a pretty serious outage, I've not heard anyone mention a timeline for a return to service for the data feeds and web sites still off-line. Hopefully, this will be fixed quickly... as I said, such a failure is completely unprecedented. But don't hold your breath; NESDIS has some major problems they're trying to resolve right now. Be patient.

As a side note, NCEP makes public it's model dump counts, hourly, and per model/cycle:

http://www.nco.ncep.noaa.gov/pmb/nwprod/realtime/

Thanks much for this detailed reply. U seem to be on the "inside" on this. Oh, by "prehistoric" feed I simply meant that the NWS, in one of their standard status messages regarding model production, made reference to some sort of back-up, text-based (I'd call that prehistoric, lol... my words, not theirs) being used to feed the data into the models. I got the impression that that method of data feed didn't last long, as they got normal connectivity back shortly thereafter (and, as you state, everything's been fine with their connection for a while now).

Glad you can confirm the greater data loss to CMC and the EC (at least potentially). That makes perfect sense. I'm not a sys admin type, so I plead ignorance to the details, but was involved in research to Ops transition at NESDIS long enough to know that there's some sort of dedicated line feed (NESDIS-NCEP). So, the notion that the EC would have a superior feed into NESDIS than our own NCEP had seemed ludicrous. And despite the tangent that "DTK" and and I went off on, that was the origin of this discussion (at least for my commentary, as I was chiming in because an organization had stated that NCEP was impacted but the EC was not... which made little sense to me). Thanks again for your inside insight on this.

On that note, are you able to respond to GaWx's question? You seem to have more info than anyone on this board has on this. I'm astonished that NatIce and OSPO are still down (well, their web sites... I'm sure, internally, they're functioning at some level). The raw data seems to be flowing - as I've seen updated snow cover maps elsewhere. So, data seems fine... just public dissemination via the web interfaces is still down (and maybe other internal functionality that I'm in the dark on). Do you know or have you heard about anything in regard to a return to functionality of these sites? I've seen no notification on any NESDIS web sites regarding this outage... I'm a little surprised there hasn't been some general notification posted. So, I think we (outsiders) are all in the dark about when a return to normal might be expected. I'd love to know what you've heard on the matter - if anything.

Thanks again for your great insight on this!

Link to comment
Share on other sites

Well, it's not just NESDIS, http://www.ospo.noaa.gov/ isn't working either. 

 

I find this to be unbelievable. A major organization such as this and they can't keep their networks operational? They need to be more upfront about what's going on.

Steve,

Would you be able to hazard a guess as to when the daily Natice Eurasian snowcover data will start getting released again? I was hoping that this being the start of a new week may allow for some hope. However, I'm not betting on it!

Link to comment
Share on other sites

HA! If I had guessed, it would have been days ago! :-) 

 

I use to help manage a 600 server 140 site network. That's one reason I find this so unbelievable. Never did we have an outage that took this long to fix. I'm not so sure it's a communications issue at this point. Replacing routers/switches etc is quick and easy and issues like that can be identified immediately. 

 

I would lean toward a major database server(s) issue, but again, I don't know how they are structured in terms of storage and servers and whether it's shared/and or redundant. 

 

They still need to be more upfront in my opinion.

Link to comment
Share on other sites

HA! If I had guessed, it would have been days ago! :-) 

 

I use to help manage a 600 server 140 site network. That's one reason I find this so unbelievable. Never did we have an outage that took this long to fix. I'm not so sure it's a communications issue at this point. Replacing routers/switches etc is quick and easy and issues like that can be identified immediately. 

 

I would lean toward a major database server(s) issue, but again, I don't know how they are structured in terms of storage and servers and whether it's shared/and or redundant. 

 

They still need to be more upfront in my opinion.

 

 

 

Just my humble two cents, it was a major hack job and it becomes a security issue to admit the damage to the hackers. And there's this:

 

http://www.oig.doc.gov/OIGPublications/OIG-14-025-A.pdf

 

 

I have just now emailed the Chief Information officer at NOAA for information, and will post if I get a reply that is worth posting.

Link to comment
Share on other sites

Just my humble two cents, it was a major hack job and it becomes a security issue to admit the damage to the hackers. And there's this:

 

http://www.oig.doc.gov/OIGPublications/OIG-14-025-A.pdf

 

 

I have just now emailed the Chief Information officer at NOAA for information, and will post if I get a reply that is worth posting.

 

Makes perfect sense.

Link to comment
Share on other sites

Well, it's not just NESDIS, http://www.ospo.noaa.gov/ isn't working either. 

 

I find this to be unbelievable. A major organization such as this and they can't keep their networks operational? They need to be more upfront about what's going on. 

 

Despite not being in the hierarchy of their URL name, OSPO is in NESDIS.  So, yep, it's just NESDIS.  Actually, most NESDIS sites don't have NESDIS in their URL.  For example, SSD (now called SPSD, I guess, based on the org structure list at the top of their page) is just www.ssd.noaa.gov ...and they, in fact, are a branch under OSPO, which is under NESDIS (see the org line at the top of their page: http://www.ssd.noaa.gov/ ...yes, they are working... despite the outage taking down the main OSPO site, which I use for their SST plots, SSD (aka SPSD) - part of OSPO - is still functional (at least their main page; I haven't gone through to see if all of the data is up to date).

Link to comment
Share on other sites

All the AMVs were missing, so they received less data than normal.

ALL of the ECMWF maps for data availability still show the AMVs with a big blank out in the eastern Pacific. Is it safe to assume the usually fine Euro is still degraded somewhat? Anybody know how much this matters to the model output?

 

http://old.ecmwf.int/products/forecasts/d/charts/monitoring/coverage/dcover!AMVs-Infrared!00!pop!od!mixed!w_coverage!latest!/

Link to comment
Share on other sites

ALL of the ECMWF maps for data availability still show the AMVs with a big blank out in the eastern Pacific. Is it safe to assume the usually fine Euro is still degraded somewhat? Anybody know how much this matters to the model output?

 

http://old.ecmwf.int/products/forecasts/d/charts/monitoring/coverage/dcover!AMVs-Infrared!00!pop!od!mixed!w_coverage!latest!/

 

Of course observations matter, but the observing systems is robust and redundant.  There is still plenty of coverage from other observations in the Pacific, including from polar orbiting satellites, aircraft, and as it turns out, the raw geostationary radiances:

 

http://old.ecmwf.int/products/forecasts/d/charts/monitoring/coverage/dcover!Geostationary-CSR!00!pop!od!mixed!w_coverage!latest!/

 

Impacts from missing observations will be felt across all operational centers.  There are some hints of minor degradations in short term skill in the PNA sector, but it's hard to know if this is from the missing observations or the actual predictability over the past week.

Link to comment
Share on other sites

Just my humble two cents, it was a major hack job and it becomes a security issue to admit the damage to the hackers. And there's this:

 

http://www.oig.doc.gov/OIGPublications/OIG-14-025-A.pdf

 

 

I have just now emailed the Chief Information officer at NOAA for information, and will post if I get a reply that is worth posting.

The hackers don't need a PDF to figure out if there's a risk or not. They already know, or could find out easily enough on their own.
Link to comment
Share on other sites

NCEP shows 100% data availability for the 18z GFS Operational run today! This is the first time there has been no data of critical importance or data of opportunity reported as missing in their realtime data monitor for any GFS run since early last week. Even the 12z today continued to show a short list of missing/partial critical data, but the 18z lost all such lists.

 

Further the NESDIS snow report server is back on line and I am quite certain that unless Rutger's has data backlog issues, they should have a new report for snow cover out after Midnight.

 

It may not make a lot of difference with all the patching in and heroics performed by NCEP personnel to keep the GFS and other models on as even a keel as possible, but it is nice to know that its back to the usual GFS that we all know and love!

 

The data availability table for the 12z GFS run:

 

http://www.nco.ncep.noaa.gov/pmb/nwprod/realtime/gfs/t12z/index.summary.shtml

 

The data availability table for the 18z GFS run:

 

http://www.nco.ncep.noaa.gov/pmb/nwprod/realtime/gfs/t18z/index.summary.shtml

Link to comment
Share on other sites

Thanks much for this detailed reply. U seem to be on the "inside" on this. Oh, by "prehistoric" feed I simply meant that the NWS, in one of their standard status messages regarding model production, made reference to some sort of back-up, text-based (I'd call that prehistoric, lol... my words, not theirs) being used to feed the data into the models. I got the impression that that method of data feed didn't last long, as they got normal connectivity back shortly thereafter (and, as you state, everything's been fine with their connection for a while now).

Glad you can confirm the greater data loss to CMC and the EC (at least potentially). That makes perfect sense. I'm not a sys admin type, so I plead ignorance to the details, but was involved in research to Ops transition at NESDIS long enough to know that there's some sort of dedicated line feed (NESDIS-NCEP). So, the notion that the EC would have a superior feed into NESDIS than our own NCEP had seemed ludicrous. And despite the tangent that "DTK" and and I went off on, that was the origin of this discussion (at least for my commentary, as I was chiming in because an organization had stated that NCEP was impacted but the EC was not... which made little sense to me). Thanks again for your inside insight on this.

On that note, are you able to respond to GaWx's question? You seem to have more info than anyone on this board has on this. I'm astonished that NatIce and OSPO are still down (well, their web sites... I'm sure, internally, they're functioning at some level). The raw data seems to be flowing - as I've seen updated snow cover maps elsewhere. So, data seems fine... just public dissemination via the web interfaces is still down (and maybe other internal functionality that I'm in the dark on). Do you know or have you heard about anything in regard to a return to functionality of these sites? I've seen no notification on any NESDIS web sites regarding this outage... I'm a little surprised there hasn't been some general notification posted. So, I think we (outsiders) are all in the dark about when a return to normal might be expected. I'd love to know what you've heard on the matter - if anything.

Thanks again for your great insight on this!

 

I'm only "on the inside" with respect to NCEP.  I'm about as much in the dark about NESDIS stuff outside of NCEP's realm than you are.  :)  Sounds like things are coming back though.

Link to comment
Share on other sites

  • 2 weeks later...

Now what?

 

000
NOUS74 KWNS 121120
ADMSPC

ADMINISTRATIVE MESSAGE
NWS STORM PREDICTION CENTER NORMAN OK
0520 AM CST WED NOV 12 2014

WE ARE AWARE OF AN ISSUE REGARDING OLD SPC PRODUCTS BEING
TRANSMITTED. THESE PRODUCTS DO NOT APPEAR TO BE ORIGINATING FROM THE
SPC AND WE ARE INVESTIGATING THE PROBLEM.

..SPC.. 11/12/2014


000
NOUS42 KWNO 121242
ADMNFD

SENIOR DUTY METEOROLOGIST NWS ADMINISTRATIVE MESSAGE
NWS NCEP CENTRAL OPERATIONS COLLEGE PARK MD
1240Z WED NOV 12 2014

LOOKS LIKE SPC IS NOT THE ONLY NCEP CENTER SEEING OLD PRODUCTS
BEING TRANSMITTED FROM UNKNOWN SOURCE.. PLEASE IGNORE ADMIN
MESSAGE FROM APRIL 26 2010.. SORRY.. NO TESTING IS BEING
PERFORMED THAT WE ARE AWARE OF AT NCEP.. NCF/TOC STILL
INVESTIGATING SOURCE OF THESE OLD TRANSMISSIONS.


NEWBY/SDM/NCO/NCEP

000
NOUS42 KWNO 121137
ADMNFD

SENIOR DUTY METEOROLOGIST NWS ADMINISTRATIVE MESSAGE
NWS NCEP CENTRAL OPERATIONS COLLEGE PARK MD
1132Z WED NOV 12 2014

RE: OLD SPC PRODUCTS BEING TRANSMITTED ...SPC..NCF/TOC SUPPORT
ARE LOOKING INTO THIS ISSUE(S) AND WILL HOPEFULLY ADDRESS THIS
ISSUE JUST AS SOON AS THEY CAN.. THANKS FOR YOUR PATIENCE WHILE
SUPPORT TRACKS DOWN THE SOURCE OF THESE TRANSMISSIONS.


SDM/NCO/NCEP

Link to comment
Share on other sites

Now what?

 

000

NOUS74 KWNS 121120

ADMSPC

ADMINISTRATIVE MESSAGE

NWS STORM PREDICTION CENTER NORMAN OK

0520 AM CST WED NOV 12 2014

WE ARE AWARE OF AN ISSUE REGARDING OLD SPC PRODUCTS BEING

TRANSMITTED. THESE PRODUCTS DO NOT APPEAR TO BE ORIGINATING FROM THE

SPC AND WE ARE INVESTIGATING THE PROBLEM.

..SPC.. 11/12/2014

000

NOUS42 KWNO 121242

ADMNFD

SENIOR DUTY METEOROLOGIST NWS ADMINISTRATIVE MESSAGE

NWS NCEP CENTRAL OPERATIONS COLLEGE PARK MD

1240Z WED NOV 12 2014

LOOKS LIKE SPC IS NOT THE ONLY NCEP CENTER SEEING OLD PRODUCTS

BEING TRANSMITTED FROM UNKNOWN SOURCE.. PLEASE IGNORE ADMIN

MESSAGE FROM APRIL 26 2010.. SORRY.. NO TESTING IS BEING

PERFORMED THAT WE ARE AWARE OF AT NCEP.. NCF/TOC STILL

INVESTIGATING SOURCE OF THESE OLD TRANSMISSIONS.

NEWBY/SDM/NCO/NCEP

000

NOUS42 KWNO 121137

ADMNFD

SENIOR DUTY METEOROLOGIST NWS ADMINISTRATIVE MESSAGE

NWS NCEP CENTRAL OPERATIONS COLLEGE PARK MD

1132Z WED NOV 12 2014

RE: OLD SPC PRODUCTS BEING TRANSMITTED ...SPC..NCF/TOC SUPPORT

ARE LOOKING INTO THIS ISSUE(S) AND WILL HOPEFULLY ADDRESS THIS

ISSUE JUST AS SOON AS THEY CAN.. THANKS FOR YOUR PATIENCE WHILE

SUPPORT TRACKS DOWN THE SOURCE OF THESE TRANSMISSIONS.

SDM/NCO/NCEP

THANKS for posting!

Link to comment
Share on other sites

NCEP Operational Status Message

Wed Nov 12 14:12:21 2014 GMT


NOUS42 KWNO 121412

ADMNFD

SENIOR DUTY METEOROLOGIST NWS ADMINISTRATIVE MESSAGE

NWS NCEP CENTRAL OPERATIONS COLLEGE PARK MD

1406Z WED NOV 12 2014

UPDATE.. WITH REGARDS TO 2010 OLD PRODUCTS BEING DISSMINATED FOR

SPC..OPC AND THE SDM MESSAGES AND PRODUCTS..

TOC REMOVED THEIR TOC LINK TO CSC..WHICH THEY BELIEVE WAS THE

SOURCE OF THE OLD PRODUCTS FROM 2010 FROM SPC..OPC..AND THE SDM

PRODUCTS AND MESSAGES GOING TO THE FIELD..WEATHER WIRE. NO OTHER

INCIDENTS HAVE BEEN OBSERVED DURING THE PAST HR. WE WILL

CONTINUE TO MONITOR FOR NEW POSSIBLE INCIDENTS.

NEWBY/SDM/NCO/NCEP

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...