I do know something about this. NCAR is running FV3-based ensembles at 3km over the US (13km global) out to 204 hours and MPAS ensembles at 3km over the US (15km global, tapered) out to 132 hours. Just from a couple of days of looking, it seems like the probability matched mean is generally higher than the straight ensemble mean. Here is where we are for the first event:
FV3
MPAS