Data Assimilation and NWP Q&A

dtk · September 1, 2011

What is 3DVAR? What is 4DVAR? Why is 4DVAR considered superior to 3DVAR?

Where does ensemble Kalman filtering fit into all of this?

Is getting the initial conditions right the biggest hold up for better models?

What other things would make models better? Can you explain why there are so many parameterizations?

For gridded models, can you explain how finite differencing is used for each time step? How does that compare with the time step calculations for spectral models?

Why do spectral models have their resolution expressed in numbers of triangles instead of distances?

First, let me just state that the questions you have listed above are covered in about 15 credit hours of graduate school work, hundreds of peer reviewed journal articles, numerous textbooks, etc. I'll do my best to keep things simple and answer questions (and try to do so without equations).

First, what is data assimilation, analysis, and initialization? These are not all the necessarily the same thing, and it's good if we can get some nomenclature sorted out to help keep the discussion clear. The term analysis is generally considered any attempt at trying to approximate the true state of a given system (this can be done simply by looking at observations and doing a hand analysis, or going full blow data assimilation to come up with initial conditions for a NWP forecast, or reanalysis product). This thread has been prompted by questions about model initial conditions (ICs), so I'll keep the focus/discussion there. [aside: initialization is actually used to describe something else, I'll save that for much later].

For quite some time now, operational centers have used data assimilation to generate ICs for NWP. An analysis that uses time distributed observations along with some dynamical model of the state is called assimilation. In other words, the model itself is used to help constrain the problem (in addition to the observations, of course). There are many flavors of data assimilation (Optimal Interpolation [OI], varational algorithms (3DVAR, 4DVAR), Kalman Filters (Extended, Ensemble), etc., each of which can actually be derived from a single, common source. In fact, if you make enough simplifying assumptions, it can be shown that they are actually all equivalent (if you apply them in particular ways).

In data assimilation, we try to estimate an atmospheric state but there are unknown errors (both in the dynamical model as well as the observations), so the means about which we try to find a solution is statistical (i.e. maximum likelihood, or probability). 3DVAR and 4DVAR are particular algorithms that are used to estimate the state of the atmosphere (typically in the space of the NWP model), assuming various things about the model/observations (in particular, their error characteristics and structure; typically assumed to be normal/Gaussian).

To accomplish this, a cost function is minimized that tries to strike the best balance between a first guess/background (information from the dynamical model, in our case, a 06 hour model forecast) and observations (taken over some time interval, in the case of the GFS/GDAS ,+/- 3 hours). 3DVAR is actually a type (subset) of 4DVAR, with one simplifying assumption. In 3DVAR, the observations within the time window are assumed to all be "valid" at the actual analysis update time...which means that there is no need to include the dynamic model in the minimization algorithm directly. There are tricks to help make 3DVAR slightly more like 4DVAR, with little cost, but I won't get into that.

In 4DVAR, the (linearized version of the) model is actually part of the minimization; and the solution is in essence an update to a whole trajectory. This is in contrast to 3DVAR, where the simplifying assumptions result in updating the state at a single time. All things being equal, this means that 4DVAR is in general much superior to 4DVAR. However, recall again that minimization algorithms are used to find the maximum probability/best state estimate. So, say for example that it takes 50 iterations to find the solution to the problem....this implies that you will actually need to run the dynamic model 100 times (!!!) as part of the algorithm (it's effectively 100, and not 50, since there is a forward/'backward' [gradient] component). Even if you only use a 06 or 12 hour window, this can be hugely expensive. Additionally, the version of the model that has to be used in the inner loops needs to be (tangent) linear [linearizing full blown NWP models is very messy]. This is why the analysis generated for the GFS/GDAS is actual at full model resolution; whereas the effective resolution of the analysis increment at ECMWF (and others running 4DVAR) is much lower than the actual model itself.

I highly recommend going through the training module on MetEd (by UCAR), entitled "Understanding Assimilation Systems: How models create their initial conditions" (sign up should be free).

There are also some nice tutorial notes and lecture slides from the ECMWF DA-NWP training online (All credit to the folks there, this is a hugely valuable resource online). Fair warning: you'll need a good bit of linear algebra and statistics background to get anything meaningful out of it.

Data Assimilation Background Slides

3DVAR Lecture Notes ; 3DVAR Slides

4DVAR Lecture Notes ; 4DVAR Slides

Of course, I am also happy to clarify anything, expand upon this, answer questions etc.

p.s. I will try to address some of the other things Adam brought up in a later post....

am19psu · September 1, 2011

First, let me just state that the questions you have listed above are covered in about 15 credit hours of graduate school work, hundreds of peer reviewed journal articles, numerous textbooks, etc.

So you're saying this is a breeze then, right?

Thanks for taking the time to do this, Daryl. I think everyone is going to learn a lot from it.

usedtobe · September 1, 2011

Daryl, Good job at discussing a complex topic.

Ed Lizard · September 1, 2011

DTK for mod!

AtticaFanatica · September 1, 2011

In terms of research, I've been pretty impressed by storm-scale EnKF analyses, the forecasts are still bad.

dtk · September 1, 2011

In terms of research, I've been pretty impressed by storm-scale EnKF analyses, the forecasts are still bad.

Do you have any specific dates/cases/examples? We are actually planning to implement a hybrid EnKF-Var algorithm into the GDAS/GFS in spring, and will be running experiments covering the HFIP period (i.e. this hurricane season). I'll be curious to see if the hybrid improves upon the EnKF-based forecasts for any examples you might be able to provide.

AtticaFanatica · September 1, 2011

Do you have any specific dates/cases/examples? We are actually planning to implement a hybrid EnKF-Var algorithm into the GDAS/GFS in spring, and will be running experiments covering the HFIP period (i.e. this hurricane season). I'll be curious to see if the hybrid improves upon the EnKF-based forecasts for any examples you might be able to provide.

I'm talking really high resolution supercell/tornado cases. Most published cases focus on analyses because the forecasts just fall apart. I don't know how that compares to the coarser resolution stuff you're talking about.

dtk · September 1, 2011

I'm talking really high resolution supercell/tornado cases. Most published cases focus on analyses because the forecasts just fall apart. I don't know how that compares to the coarser resolution stuff you're talking about.

Ah, I understand now what you're getting at (I've seen many presentations on this stuff as well). As an aside, I think it is becoming pretty well established that EnKF (and hybrid) analyses are particular good for tropical cyclones (analyses and forecasts), since their error covariances are so robust.

baroclinic_instability · September 3, 2011

Thanks dtk, good post. The relatively small amount of numerical modeling I did in undergrad was just enough to illustrate the amazing complexities in data assimilation, objective analysis, and numerical computations. Weather models are truly a nod to human ingenuity.

For those who want the very basic beginner understanding of the topic, from ECMWF:

Ed Lizard · September 3, 2011

next up is what a spectral model is?

OKpowdah · September 3, 2011

next up is what a spectral model is?

Basically instead of variables being analyzed at grid points, they're analyzed as a fourier series (waves of varying amplitude and wavelength), so that derivatives in the dynamical equations can be made on continuous functions, without any smoothing or loss of accuracy

dtk · September 3, 2011

Basically instead of variables being analyzed at grid points, they're analyzed as a fourier series (waves of varying amplitude and wavelength), so that derivatives in the dynamical equations can be made on continuous functions, without any smoothing or loss of accuracy

Right, with two caveats:

1) The vertical is represented by discrete layers just as in a regular, gridpoint model (for the GFS, these are hybrid sigma-pressure layers -- terrain-following near the surface, constant pressure from the stratosphere to the top, and a mix of the two in between).

2) There are transforms that are necessary to go between the wave-space and physical space (within the model itself, since the physical parameterizations are performed to the physical variables and not their spectral representation).

You are also right to point out that the derivatives are much more accurate (and cheap to calculate) in wave space. Additionally, you can use a much longer time step with spectral models which helps with efficiency.

From the original questions: Why do spectral models have their resolution expressed in numbers of triangles instead of distances?

The resolution of a spectral model is described in terms of the number of waves it can represent in the horizontal. The "T", is actually for the triangular truncation (though the number isn't really expressing a number or triangles....it's actually expressing the number of waves in a single direction). I believe there is actually a second type of spectral truncation, rhomboidal, that I don't believe is used much (if anywhere, operationally at least). The minimum wavelength that a spectral model can adequately represent ~ 360 (degrees) / N (where N is the triangular truncation).

dtk · September 3, 2011

Another plug for the UCAR MetEd modules for a good description of NWP related things (some information about gridpoint versus spectral, model parameterizations, etc.).

Impact of Model Structure and Dynamics

winterymix · September 3, 2011

Can someone visit in this thread and translate/dumb it down for those with

IQs between 125 and 160?

baroclinic_instability · September 3, 2011

Another plug for the UCAR MetEd modules for a good description of NWP related things (some information about gridpoint versus spectral, model parameterizations, etc.).

Impact of Model Structure and Dynamics

Can someone visit in this thread and translate/dumb it down for those with

IQs between 125 and 160?

Another great recommendation for some beginner (and also some semi-advanced training) is from ECMWF. They have a TON of information here with multiple sublinks beyond the main topics.

http://www.ecmwf.int/products/forecasts/guide/index.html

WishingForWarmWeather · September 4, 2011

Can someone visit in this thread and translate/dumb it down for those with

IQs between 125 and 160?

It's not even having an IQ of those numbers, because even very smart individuals will have a hard time understanding what he is talking about if they don't have a significant background in calculus as well as other things.

But, I agree. I'd really love to know what he is trying to teach, except I don't even know where to begin. I've been reading it all, but to be completely honest, I don't really have an understanding of what I'm actually reading.

WishingForWarmWeather · September 4, 2011

Another great recommendation for some beginner (and also some semi-advanced training) is from ECMWF. They have a TON of information here with multiple sublinks beyond the main topics.

http://www.ecmwf.int...uide/index.html

Thank you. I've read all of the other links posted as well, and they definitely serve in giving some sort of grasp of what is being discussed. But this is a very intense topic.

am19psu · February 14, 2012

You can tell there is a threat on the models, because I am bumping dtk threads to keep model nonsense out of the discussions

Member Statistics

Data Assimilation and NWP Q&A

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Archived

Recently Browsing 0 members