Work with obsSpace#
JEDI or GSI generates observation space diagnostic files, which contains original observation information, hofx (H(x), i.e. model counter-parts at the observataion locations), OMB (observation minus background), OMA as well as other information.
This notebook covers how to deal with JEDI diganostic files (which are also called output ioda files). We call them jdiag files.
import packages and define variables#
%%time
# autoload external python modules if they changed
%load_ext autoreload
%autoreload 2
import os, sys
pyDAmonitor_ROOT=os.getenv("pyDAmonitor_ROOT")
if pyDAmonitor_ROOT is None:
print("!!! pyDAmonitor_ROOT is NOT set. Run `source ush/load_pyDAmonitor.sh`")
else:
print(f"pyDAmonitor_ROOT={pyDAmonitor_ROOT}\n")
sys.path.insert(0, pyDAmonitor_ROOT)
# import modules
import warnings
import math
import numpy as np
import uxarray as ux
import xarray as xr
import pandas as pd
import seaborn as sns # seaborn handles NaN values automatically
import matplotlib.pyplot as plt
from netCDF4 import Dataset
from DAmonitor.base import query_dataset, query_data, query_obj, to_dataframe, aircraft_reject_list_to_df
#jdiag_file=f"{pyDAmonitor_ROOT}/data/ioda/ioda_aircar.2024050600.nc"
jdiag_file=f"{pyDAmonitor_ROOT}/data/mpasjedi/jdiag_aircar_t133.nc"
pyDAmonitor_ROOT=/home/Guoqing.Ge/pyDAmonitor
CPU times: user 3.27 s, sys: 1.05 s, total: 4.32 s
Wall time: 5.01 s
Use the obsSpace class to read jdiag files#
from DAmonitor.obs import obsSpace, fit_rate
# create a t133 object using the obsSpace class
t133=obsSpace(jdiag_file)
# check the t133 object
query_obj(t133)
['bt',
'ds',
'filepath',
'groups',
'locations',
'metadata',
'nlocs',
'q',
't',
'u',
'v']
np.set_printoptions(threshold=np.inf) # print out all array values
# print(t133.t.aircraftFlightNumber)
# print(t133.t.stationIdentification)
print(t133.t.stationIdentification[10])
print(t133.t.aircraftFlightNumber[10])
#
np.set_printoptions(threshold=500) # don't print out all array values
print(t133.t.ObsValue)
KKJ1Z4BA
R33RPDYQ
[218.15 223.15 250.45 ... 224.15 222.15 224.15]
# query data
query_data(t133.q) # since the t133 object does not contain q observations, so it will display meta information
stationIdentification, aircraftFlightPhase, timeOffset, prepbufrDataLvlCat, longitude, stationElevation, latitude, prepbufrReportType, pressure, dateTime, height, dumpReportType, aircraftFlightNumber, ObsError, ObsType, ObsValue, QualityMarker, Location
# query data
query_data(t133.t)
EffectiveError0, EffectiveError1, EffectiveError2, EffectiveQC0, EffectiveQC1, EffectiveQC2, stationIdentification, aircraftFlightPhase, timeOffset, prepbufrDataLvlCat, longitude, stationElevation, latitude, prepbufrReportType, pressure, dateTime, height, dumpReportType, aircraftFlightNumber, ObsBias0, ObsBias1, ObsBias2, ObsError, ObsType, ObsValue, QualityMarker, TempObsErrorData, hofx0, hofx1, hofx2, innov1, oman, ombg, Location
The above output shows that we reorganize the data structure based on the observation variable, i.e. t, q, uv, bt
We can access values easily using popular Python class strucutre, i.e. t133.t, t133.t.ObsValue, etc
Convert to Pandas DataFrame#
Converting jdiag data into Pandas DataFrame brings up lots of benefits and we can use utilize lots of mature DataFrames capabilities.
We can see this from the following cell.
pd.set_option('display.max_columns', None) # show all dataframe columns
df = to_dataframe(t133.t)
df[df["QualityMarker"]>=12] # filter the data frame
#df
| EffectiveError0 | EffectiveError1 | EffectiveError2 | EffectiveQC0 | EffectiveQC1 | EffectiveQC2 | stationIdentification | aircraftFlightPhase | timeOffset | prepbufrDataLvlCat | longitude | stationElevation | latitude | prepbufrReportType | pressure | dateTime | height | dumpReportType | aircraftFlightNumber | ObsBias0 | ObsBias1 | ObsBias2 | ObsError | ObsType | ObsValue | QualityMarker | TempObsErrorData | hofx0 | hofx1 | hofx2 | innov1 | oman | ombg | Location |
|---|
Read the aircraft reject list from “current_bad_aircraft.txt”#
filepath=f"{pyDAmonitor_ROOT}/data/current_bad_aircraft.txt"
dflist = aircraft_reject_list_to_df(filepath)
dflist
| tailID | flagT | flagW | flagR | FSL | MDCRS | N | bs_T | std_T | bs_S | std_S | bs_D | std_D | bs_W | std_W | bs_RH | std_RH | fail_reason | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 00000320 | T | W | - | 26490 | -------- | 470 | 11.3 | 2.8 | 0.8 | 2.7 | -8 | 11 | 1.9 | 4.0 | 0.0 | 0.0 | Unknown(bias_Tstd_Tbias_DIR) |
| 1 | 00000400 | T | W | R | 0 | -------- | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | UND_Piper(resrch_T_W_R) |
| 2 | 00000450 | T | W | R | 9086 | -------- | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | UND_Piper(resrch_T_W_Rresrch_T_W_R) |
| 3 | 00000451 | T | W | R | 9148 | -------- | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | King_Air(resrch_T_W_RU_Wyo_T_W_R) |
| 4 | 00000499 | T | W | R | 0 | -------- | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | Unknown(TAM_FS_T_W_R) |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 278 | N988DL | - | W | - | 250 | C5M13RZA | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | MD88(md88_W) |
| 279 | N989DL | - | W | - | 251 | ENXL3TBA | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | MD88(md88_W) |
| 280 | N990AK | T | - | - | 31354 | VM3LX3RA | 924 | -0.4 | 15.6 | 0.3 | 2.1 | -1 | 7 | 1.6 | 3.0 | 0.0 | 0.0 | Unknown(std_T) |
| 281 | OE-LPL | - | W | - | 32404 | UAV15PJA | 310 | 0.9 | 0.9 | 1.1 | 2.2 | 18 | 63 | 18.1 | 21.8 | 0.0 | 0.0 | Unknown(bias_DIRstd_DIRstd_Wrms_W) |
| 282 | OE-LPM | - | W | - | 32579 | BGWUR1ZA | 269 | 0.2 | 1.0 | 1.0 | 2.3 | 11 | 65 | 17.5 | 21.0 | 0.0 | 0.0 | Unknown(bias_DIRstd_DIRstd_Wrms_W) |
283 rows × 18 columns
check if any tailID is in the jdiag stationIdentification field#
# all 283 tailIDs are NOT found in jdiag `stationIdentification` fields
mask = dflist["tailID"].isin(df["stationIdentification"])
print(len(dflist[mask]))
0
# this confirms all 283 rows are NOT found in jdiag `stationIdentification` fields
mask = ~dflist["tailID"].isin(df["stationIdentification"])
dflist[mask]
| tailID | flagT | flagW | flagR | FSL | MDCRS | N | bs_T | std_T | bs_S | std_S | bs_D | std_D | bs_W | std_W | bs_RH | std_RH | fail_reason | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 00000320 | T | W | - | 26490 | -------- | 470 | 11.3 | 2.8 | 0.8 | 2.7 | -8 | 11 | 1.9 | 4.0 | 0.0 | 0.0 | Unknown(bias_Tstd_Tbias_DIR) |
| 1 | 00000400 | T | W | R | 0 | -------- | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | UND_Piper(resrch_T_W_R) |
| 2 | 00000450 | T | W | R | 9086 | -------- | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | UND_Piper(resrch_T_W_Rresrch_T_W_R) |
| 3 | 00000451 | T | W | R | 9148 | -------- | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | King_Air(resrch_T_W_RU_Wyo_T_W_R) |
| 4 | 00000499 | T | W | R | 0 | -------- | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | Unknown(TAM_FS_T_W_R) |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 278 | N988DL | - | W | - | 250 | C5M13RZA | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | MD88(md88_W) |
| 279 | N989DL | - | W | - | 251 | ENXL3TBA | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | MD88(md88_W) |
| 280 | N990AK | T | - | - | 31354 | VM3LX3RA | 924 | -0.4 | 15.6 | 0.3 | 2.1 | -1 | 7 | 1.6 | 3.0 | 0.0 | 0.0 | Unknown(std_T) |
| 281 | OE-LPL | - | W | - | 32404 | UAV15PJA | 310 | 0.9 | 0.9 | 1.1 | 2.2 | 18 | 63 | 18.1 | 21.8 | 0.0 | 0.0 | Unknown(bias_DIRstd_DIRstd_Wrms_W) |
| 282 | OE-LPM | - | W | - | 32579 | BGWUR1ZA | 269 | 0.2 | 1.0 | 1.0 | 2.3 | 11 | 65 | 17.5 | 21.0 | 0.0 | 0.0 | Unknown(bias_DIRstd_DIRstd_Wrms_W) |
283 rows × 18 columns
check if any MDCRS is in the jdiag stationIdentification fields#
mask = dflist["MDCRS"].isin(df["stationIdentification"])
print(len(dflist[mask]))
dflist[mask]
53
| tailID | flagT | flagW | flagR | FSL | MDCRS | N | bs_T | std_T | bs_S | std_S | bs_D | std_D | bs_W | std_W | bs_RH | std_RH | fail_reason | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 28 | N12005 | - | W | - | 23986 | X0ZRVXRA | 410 | 0.3 | 0.9 | 0.7 | 2.5 | 19 | 61 | 20.6 | 24.2 | 0.0 | 0.0 | Unknown(bias_DIRstd_DIRstd_Wrms_W) |
| 29 | N12006 | - | W | - | 24180 | GN0RPDJA | 334 | 0.9 | 0.9 | 0.5 | 2.3 | 8 | 37 | 13.2 | 14.6 | 0.0 | 0.0 | Unknown(bias_DIRstd_DIRstd_Wrms_W) |
| 31 | N12012 | - | W | - | 26076 | LO5SVXJA | 401 | 0.8 | 1.0 | 0.6 | 2.2 | 6 | 59 | 19.5 | 22.4 | 0.0 | 0.0 | Unknown(std_DIRstd_Wrms_W) |
| 34 | N12116 | T | - | - | 5159 | SSQRPMJA | 482 | 2.8 | 1.2 | 0.9 | 3.6 | -1 | 8 | 2.0 | 4.9 | 0.0 | 0.0 | Unknown(bias_T) |
| 37 | N14011 | - | W | - | 25904 | 5YSSATRA | 554 | 0.6 | 0.9 | 0.5 | 2.3 | 4 | 57 | 15.8 | 18.4 | 0.0 | 0.0 | Unknown(std_DIRstd_Wrms_W) |
| 38 | N14016 | - | W | - | 29764 | JS5BPMRA | 390 | 0.8 | 1.0 | 0.3 | 2.5 | 9 | 55 | 15.1 | 17.6 | 0.0 | 0.0 | Unknown(bias_DIRstd_DIRstd_Wrms_W) |
| 43 | N17002 | - | W | - | 23457 | WSRSVRZA | 280 | 0.5 | 1.2 | 0.4 | 2.2 | 0 | 51 | 13.5 | 15.6 | 0.0 | 0.0 | Unknown(std_DIRstd_Wrms_W) |
| 45 | N17017 | - | W | - | 29260 | VBVTBYJA | 501 | 1.1 | 1.0 | 0.8 | 2.5 | 1 | 61 | 14.4 | 17.3 | 0.0 | 0.0 | Unknown(std_DIRstd_Wrms_W) |
| 46 | N17963 | - | W | - | 13118 | CKHSYBBA | 242 | 1.1 | 0.9 | 0.5 | 2.2 | 4 | 59 | 14.5 | 17.1 | 0.0 | 0.0 | Unknown(std_DIRstd_Wrms_W) |
| 48 | N19986 | - | W | - | 27637 | 01CBZMBA | 396 | 0.6 | 0.9 | 0.4 | 2.4 | -5 | 52 | 11.6 | 13.8 | 0.0 | 0.0 | Unknown(std_DIRstd_Wrms_W) |
| 50 | N211WN | - | - | R | 11131 | ZS4UOGBA | 3228 | 0.1 | 0.8 | 0.3 | 1.8 | -1 | 7 | 1.4 | 2.6 | 12.6 | 25.0 | Unknown(bias_RHstd_RH) |
| 51 | N213WN | - | - | R | 11076 | M5OEOGJA | 3576 | -0.2 | 0.7 | 0.3 | 1.7 | -2 | 8 | 1.3 | 2.5 | 0.2 | 45.4 | Unknown(std_RH) |
| 54 | N236WN | - | - | R | 11098 | VXZUOFRA | 3113 | -0.1 | 0.7 | 0.4 | 1.7 | -1 | 6 | 1.3 | 2.5 | -52.1 | 21.9 | Unknown(bias_RHstd_RH) |
| 66 | N267WN | - | - | R | 11368 | US5EOFJA | 3244 | -0.2 | 0.8 | 0.3 | 1.7 | -0 | 6 | 1.2 | 2.4 | 58.4 | 89.2 | Unknown(bias_RHstd_RH) |
| 78 | N27901 | - | W | - | 11326 | 1NISBMJA | 389 | 1.5 | 1.0 | 1.0 | 3.1 | 6 | 52 | 11.1 | 13.3 | 0.0 | 0.0 | Unknown(std_DIRstd_Wrms_W) |
| 80 | N27964 | - | W | - | 13170 | 3N5BFFBA | 382 | 1.1 | 1.0 | 0.8 | 2.4 | -6 | 64 | 13.1 | 16.0 | 0.0 | 0.0 | Unknown(std_DIRstd_Wrms_W) |
| 84 | N281WN | - | - | R | 11419 | 5EVEOGBA | 3039 | 0.0 | 0.7 | 0.2 | 1.6 | -0 | 6 | 1.2 | 2.2 | 57.2 | 85.6 | Unknown(bias_RHstd_RH) |
| 90 | N28912 | - | W | - | 11984 | 4B1SV4JA | 564 | 0.2 | 0.8 | 1.2 | 2.3 | 10 | 58 | 14.1 | 16.5 | 0.0 | 0.0 | Unknown(bias_DIRstd_DIRstd_Wrms_W) |
| 91 | N28987 | - | W | - | 27599 | MWFDFYRA | 542 | 0.0 | 1.1 | 0.6 | 2.3 | 4 | 68 | 11.5 | 14.4 | 0.0 | 0.0 | Unknown(std_DIRstd_Wrms_W) |
| 96 | N29907 | - | W | - | 11481 | GYOTA4ZA | 484 | 0.3 | 0.8 | 0.6 | 2.5 | 10 | 52 | 12.6 | 14.7 | 0.0 | 0.0 | Unknown(bias_DIRstd_DIRstd_Wrms_W) |
| 100 | N30913 | - | W | - | 12237 | XNJCPKZA | 481 | 0.2 | 0.8 | 0.7 | 2.5 | 8 | 47 | 14.8 | 16.7 | 0.0 | 0.0 | Unknown(bias_DIRstd_DIRstd_Wrms_W) |
| 109 | N394UP | T | - | - | 26003 | JEWWUUJA | 645 | 3.4 | 1.0 | -0.4 | 2.2 | 2 | 10 | 1.7 | 3.3 | 0.0 | 0.0 | Unknown(bias_T) |
| 118 | N410UP | - | - | R | 701 | PPJGU1JA | 2154 | 1.3 | 0.9 | 0.5 | 2.3 | -1 | 7 | 1.6 | 3.1 | 52.1 | 91.0 | Unknown(bias_RHstd_RH) |
| 123 | N438WN | - | - | R | 11165 | DVPUOFZA | 2866 | -0.0 | 0.8 | 0.2 | 1.6 | -0 | 7 | 1.3 | 2.5 | -11.0 | 65.6 | Unknown(bias_RHstd_RH) |
| 124 | N442WN | - | - | R | 11179 | ZGUUOGRA | 3035 | -0.2 | 0.8 | 0.4 | 1.9 | -1 | 7 | 1.4 | 2.7 | 59.1 | 90.3 | Unknown(bias_RHstd_RH) |
| 128 | N45905 | - | W | - | 11430 | PIKRV4RA | 420 | 0.8 | 0.9 | 0.8 | 2.2 | 17 | 53 | 19.7 | 22.4 | 0.0 | 0.0 | Unknown(bias_DIRstd_DIRstd_Wrms_W) |
| 129 | N459WN | - | - | R | 10804 | CGCEOHBA | 3247 | -0.0 | 0.7 | 0.2 | 1.6 | -0 | 7 | 1.2 | 2.3 | 42.7 | 75.0 | Unknown(bias_RHstd_RH) |
| 135 | N6715C | T | - | - | 2878 | 550IZAZA | 239 | 2.3 | 0.9 | 0.8 | 2.8 | -1 | 10 | 1.8 | 3.9 | 0.0 | 0.0 | Unknown(bias_T) |
| 136 | N684UA | T | - | - | 24706 | 2G1IR4ZA | 630 | -5.2 | 6.0 | 0.3 | 3.7 | -2 | 8 | 2.4 | 4.8 | 0.0 | 0.0 | Unknown(bias_Tstd_T) |
| 149 | N801AC | - | W | - | 12328 | VMUIZ4BA | 416 | 1.3 | 0.8 | 0.3 | 2.4 | 11 | 45 | 16.5 | 18.3 | 0.0 | 0.0 | Unknown(bias_DIRstd_DIRstd_Wrms_W) |
| 150 | N802AN | - | W | - | 12501 | EPSEILRA | 214 | 0.9 | 0.8 | 0.7 | 2.7 | 13 | 45 | 10.5 | 12.2 | 0.0 | 0.0 | Unknown(bias_DIRstd_DIRstd_Wrms_W) |
| 152 | N804AN | - | W | - | 12482 | 0OEUILJA | 436 | 0.7 | 0.8 | 0.6 | 2.5 | 11 | 45 | 11.9 | 13.7 | 0.0 | 0.0 | Unknown(bias_DIRstd_DIRstd_Wrms_W) |
| 153 | N805AN | - | W | - | 12487 | YX5EILZA | 405 | 0.8 | 0.9 | 1.0 | 2.2 | 18 | 60 | 16.0 | 18.9 | 0.0 | 0.0 | Unknown(bias_DIRstd_DIRstd_Wrms_W) |
| 154 | N806AA | - | W | - | 12575 | WITICOZA | 289 | 0.9 | 0.8 | 0.9 | 2.5 | 17 | 43 | 15.2 | 17.0 | 0.0 | 0.0 | Unknown(bias_DIRstd_DIRstd_Wrms_W) |
| 156 | N806NW | T | - | - | 10675 | AFIHLIJA | 360 | -2.6 | 4.3 | 0.9 | 2.6 | -0 | 5 | 2.1 | 3.6 | 0.0 | 0.0 | Unknown(bias_Tstd_T) |
| 163 | N814AA | - | W | - | 13179 | LNWICORA | 488 | 0.8 | 0.8 | 0.9 | 2.5 | 11 | 52 | 12.5 | 14.9 | 0.0 | 0.0 | Unknown(bias_DIRstd_DIRstd_Wrms_W) |
| 164 | N815AA | - | W | - | 13251 | J4RYCOJA | 545 | -0.2 | 0.9 | 0.6 | 2.2 | 12 | 46 | 15.1 | 16.9 | 0.0 | 0.0 | Unknown(bias_DIRstd_DIRstd_Wrms_W) |
| 165 | N817AN | - | W | - | 13837 | KCWEIERA | 409 | 1.0 | 0.7 | 0.6 | 2.6 | 14 | 45 | 14.0 | 15.9 | 0.0 | 0.0 | Unknown(bias_DIRstd_DIRstd_Wrms_W) |
| 170 | N823AN | - | W | - | 13667 | BOZEILJA | 399 | -0.2 | 0.8 | 0.9 | 2.3 | 13 | 46 | 17.7 | 19.7 | 0.0 | 0.0 | Unknown(bias_DIRstd_DIRstd_Wrms_W) |
| 171 | N824AN | - | W | - | 13693 | 12KEILJA | 403 | 0.3 | 1.0 | 0.7 | 2.4 | -1 | 40 | 11.5 | 12.9 | 0.0 | 0.0 | Unknown(std_DIRstd_Wrms_W) |
| 176 | N8310C | - | - | R | 12692 | SQXIOYZA | 2706 | -0.1 | 0.6 | 0.3 | 1.7 | -1 | 6 | 1.3 | 2.5 | 23.5 | 90.8 | Unknown(bias_RHstd_RH) |
| 178 | N832AA | - | W | - | 18533 | 1AQICMZA | 338 | 1.1 | 0.9 | 0.9 | 2.5 | 13 | 41 | 13.4 | 15.1 | 0.0 | 0.0 | Unknown(bias_DIRstd_DIRstd_Wrms_W) |
| 188 | N870AX | - | W | - | 26291 | JTFX3UJA | 236 | 0.9 | 0.8 | 0.8 | 2.4 | 4 | 28 | 9.3 | 10.4 | 0.0 | 0.0 | Unknown(std_Wrms_W) |
| 191 | N873BB | - | W | - | 27881 | 1FZYW3BA | 248 | 0.1 | 0.9 | 0.8 | 2.2 | 4 | 40 | 9.5 | 10.8 | 0.0 | 0.0 | Unknown(std_DIRstd_Wrms_W) |
| 192 | N874AN | - | W | - | 29211 | BTXEILZA | 335 | 1.4 | 0.7 | 0.8 | 2.7 | 8 | 31 | 9.8 | 11.1 | 0.0 | 0.0 | Unknown(bias_DIRstd_DIRstd_Wrms_W) |
| 193 | N8755L | T | - | - | 28364 | BNJLYXJA | 2794 | 1.3 | 3.1 | 0.2 | 1.9 | -1 | 6 | 1.4 | 2.7 | 0.0 | 0.0 | Unknown(std_T) |
| 200 | N881BK | - | W | - | 29385 | JJ3LILRA | 479 | 1.0 | 0.9 | 0.5 | 2.4 | 10 | 54 | 14.1 | 16.2 | 0.0 | 0.0 | Unknown(bias_DIRstd_DIRstd_Wrms_W) |
| 201 | N882BL | - | W | - | 30541 | CNZ13PZA | 514 | 0.8 | 0.9 | 0.8 | 2.5 | 16 | 44 | 13.0 | 14.8 | 0.0 | 0.0 | Unknown(bias_DIRstd_DIRstd_Wrms_W) |
| 202 | N8838Q | T | - | - | 29942 | PAKWX3RA | 2987 | -3.1 | 0.9 | 0.3 | 2.1 | -1 | 7 | 1.5 | 3.1 | 0.0 | 0.0 | Unknown(bias_T) |
| 204 | N884AA | - | W | - | 29680 | 2IZYCORA | 248 | 0.6 | 0.8 | 0.4 | 2.5 | 21 | 68 | 22.8 | 26.8 | 0.0 | 0.0 | Unknown(bias_DIRstd_DIRstd_Wrms_W) |
| 206 | N8863Q | T | - | - | 30549 | 0XKWXBZA | 2333 | 0.3 | 2.3 | 0.3 | 1.9 | -0 | 7 | 1.5 | 2.9 | 0.0 | 0.0 | Unknown(std_T) |
| 241 | N944WN | - | - | R | 11935 | NFFUOGZA | 2948 | -0.2 | 0.7 | 0.2 | 1.6 | -1 | 6 | 1.2 | 2.3 | -24.2 | 14.9 | Unknown(bias_RH) |
| 243 | N945WN | - | - | R | 10980 | XU4UOFBA | 2798 | 0.1 | 0.7 | 0.2 | 1.6 | -0 | 7 | 1.2 | 2.3 | -25.5 | 14.5 | Unknown(bias_RH) |
check if any tailID or MDCRS is in the jdiag stationIdentification fields#
mask = dflist["tailID"].isin(df["aircraftFlightNumber"])
print(len(dflist[mask]))
mask = dflist["MDCRS"].isin(df["aircraftFlightNumber"])
print(len(dflist[mask]))
0
0
Plot hisgrams#
example 1#
plt.figure(figsize=(8, 5))
#sns.histplot(df["oman"], bins=50, kde=True, color="steelblue")
sns.histplot(t133.t.oman, bins=100, kde=True, color="steelblue")
plt.title("Histogram of oman")
plt.xlabel("oman values")
plt.ylabel("Density")
plt.tight_layout()
plt.show()
example 2#
df_long = df[["oman", "ombg"]].melt(var_name="variable", value_name="value")
plt.figure(figsize=(8, 5))
sns.histplot(data=df_long, x="value", hue="variable", bins=50, kde=True, element="step", stat="count")
plt.title("Overlayed Histogram: oman vs ombg")
plt.tight_layout()
plt.show()
example 3#
plt.figure(figsize=(8, 5))
sns.histplot(df["oman"], bins=100, kde=True, color="blue", label="oman", multiple="layer")
sns.histplot(df["ombg"], bins=100, kde=True, color="red", label="ombg", multiple="layer")
plt.title("Overlayed Histogram: oman vs ombg")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.legend()
plt.tight_layout()
plt.show()
plot fitting rate#
The rate fitting to observtions is an important metric to evaluate data assimilation performance.
We don’t want a small fitting rate, which means very little observation impacts on the model forecast.
We don’t want a large fitting rate either, which means we fit too close to the obervations and the model forecast will crash.
Ususally, a fitting rate of 20%~30% is expected.
# Filter valid data (both 'oman' and 'ombg' are not NaN)
valid_df = df[df["oman"].notna() & df["ombg"].notna()].copy()
valid_df = valid_df.dropna(subset=["height"]) # removes any rows in valid_df where height is missing (NaN)
# print(valid_df[valid_df["height"] < 0]["height"]) # negative height
dz = 1000
grouped = fit_rate(t133.t, dz=dz)
# 5. Plot vertical profile of fit_rate vs height
plt.figure(figsize=(7, 6))
plt.plot(grouped["fit_rate"], grouped["height_bin"], marker="o", color="blue")
# plt.axvline(x=0, color="gray", linestyle="--") # ax vertical line
plt.xlabel("Fit Rate (%)") # label change
plt.gca().xaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f'{x*100:.0f}%')) # format as %
plt.ylabel("Height Bin (m)")
plt.title("Vertical Profile of Fit Rate")
# Fine-tune ticks
plt.xticks(np.arange(0, 0.25, 0.05)) #, fontsize=12)
plt.yticks(np.arange(0, 13000, dz)) #, , fontsize=12)
# Add minor ticks
from matplotlib.ticker import AutoMinorLocator
plt.gca().xaxis.set_minor_locator(AutoMinorLocator())
plt.gca().yaxis.set_minor_locator(AutoMinorLocator())
# plt.grid(which='both', linestyle='--', linewidth=0.5)
plt.grid(True)
plt.ylim(0, 13000) # set y-axis from 0 (bottom) to 13,000 (top)
plt.tight_layout()
plt.show()
OMA: bias=0.1127 rms=0.9431
OMB: bias=0.1390 rms=1.0462
Overall fit_rate: 9.8576%
0, -1000, 1.7179, 1.7704, 2.9680%
1, 0, 1.3425, 1.4839, 9.5256%
2, 1000, 0.8929, 0.9891, 9.7274%
3, 2000, 0.9511, 1.0712, 11.2078%
4, 3000, 0.8140, 0.9379, 13.2102%
5, 4000, 0.7072, 0.8306, 14.8529%
6, 5000, 0.7251, 0.8159, 11.1235%
7, 6000, 0.7037, 0.7727, 8.9264%
8, 7000, 0.7456, 0.8361, 10.8245%
9, 8000, 0.7729, 0.8456, 8.5992%
10, 9000, 0.7673, 0.8331, 7.9049%
11, 10000, 0.8056, 0.8697, 7.3711%
12, 11000, 1.1992, 1.3272, 9.6471%
13, 12000, 1.3538, 1.4615, 7.3744%
print(grouped["height_bin"])
0 -1000
1 0
2 1000
3 2000
4 3000
5 4000
6 5000
7 6000
8 7000
9 8000
10 9000
11 10000
12 11000
13 12000
Name: height_bin, dtype: int32
plot satellite observations#
load satellite data using obsSpace#
%%time
cris_file = f"{pyDAmonitor_ROOT}/data/mpasjedi/jdiag_cris-fsr_n20.nc"
obsCris = obsSpace(cris_file)
CPU times: user 3.19 s, sys: 7.45 s, total: 10.6 s
Wall time: 36.8 s
query_data(obsCris.bt, meta_exclude="sensorCentralWavenumber_") # removing the meta_exclude paramter will output all sensorCentralWavenumber_* information
DerivedObsError, DerivedObsValue, CloudDetectMinResidualIR, NearSSTRetCheckIR, ScanEdgeRemoval, GrossCheck, Thinning, UseflagCheck, EffectiveError0, EffectiveError1, EffectiveError2, EffectiveQC0, EffectiveQC1, EffectiveQC2, sensorZenithAngle, satelliteIdentifier, sensorScanPosition, instrumentIdentifier, longitude, latitude, sensorViewAngle, scanLineNumber, solarZenithAngle, sensorChannelNumber, sensorAzimuthAngle, fieldOfViewNumber, dateTime, solarAzimuthAngle, ObsBias0, ObsBias1, ObsBias2, constantPredictor, emissivityJacobianPredictor, hofx0, hofx1, hofx2, innov1, lapseRatePredictor, lapseRate_order_2Predictor, oman, ombg, sensorScanAnglePredictor, sensorScanAngle_order_2Predictor, sensorScanAngle_order_3Predictor, sensorScanAngle_order_4Predictor, Channel, Location
print(obsCris.bt.hofx0)
[[-- -- -- ... -- -- --]
[222.63551330566406 224.9462127685547 222.70394897460938 ...
291.0711669921875 291.02618408203125 290.8084411621094]
[-- -- -- ... -- -- --]
...
[-- -- -- ... -- -- --]
[-- -- -- ... -- -- --]
[-- -- -- ... -- -- --]]
assemble target obs array#
ncount=0
idx = []
idx2 = []
ch=61
for n in np.arange(len(obsCris.bt.ombg[:,ch])):
#if obsCris.bt.CloudDetectMinResidualIR[n,ch] == 1:
if obsCris.bt.ombg[n,ch] > -200 and obsCris.bt.ombg[n,ch] < 200:
idx.append(n)
ncount = ncount + 1
lat=obsCris.bt.latitude[idx]
lon=obsCris.bt.longitude[idx]
obarray=obsCris.bt.DerivedObsValue[idx,ch]
print(lon,lat,obarray)
print(ncount)
[-127.40092 -129.51625 -131.27661 ... -132.82477 -135.50513 -137.78625] [24.14625 24.23062 24.33297 ... 47.29415 47.29884 47.30563] [233.91612 235.93317 237.10016 ... 232.89937 233.92868 234.85745]
9521
prepare coloar map#
datmi = np.nanmin(obarray) # Min of the data
datma = np.nanmax(obarray) # Max of the data
if np.nanmin(obarray) < 0:
cmax = datma
cmin = datmi
cmax=310
cmin=200
#cmax=1.0
#cmin=-1.0
cmap = 'RdBu'
else:
#cmax = omean+stdev
#cmin = np.maximum(omean-stdev, 0.0)
cma = datma
cmin = datmi
cmax=310
cmin=200
#cmax=1.0
#cmin=-1.0
cmap = 'RdBu'
cmap = 'viridis'
cmap = 'jet'
cmin = 200.
cmax = 310.
conus_12km = [-150, -50, 15, 55]
color_map = plt.cm.get_cmap(cmap)
reversed_color_map = color_map.reversed()
units = 'K'
#units = '%'
fig = plt.figure(figsize=(10, 5))
/tmp/ipykernel_4082298/1360930195.py:30: MatplotlibDeprecationWarning: The get_cmap function was deprecated in Matplotlib 3.7 and will be removed in 3.11. Use ``matplotlib.colormaps[name]`` or ``matplotlib.colormaps.get_cmap()`` or ``pyplot.get_cmap()`` instead.
color_map = plt.cm.get_cmap(cmap)
<Figure size 1000x500 with 0 Axes>
make the plot#
import cartopy.crs as ccrs
import matplotlib.ticker as mticker
import matplotlib.colors as colors
ax = plt.axes(projection=ccrs.PlateCarree(central_longitude=0))
# Plot grid lines
# ----------------
gl = ax.gridlines(crs=ccrs.PlateCarree(central_longitude=0), draw_labels=True,
linewidth=1, color='gray', alpha=0.5, linestyle='-')
gl.top_labels = False
gl.xlabel_style = {'size': 10, 'color': 'black'}
gl.ylabel_style = {'size': 10, 'color': 'black'}
gl.xlocator = mticker.FixedLocator(
[-180, -135, -90, -45, 0, 45, 90, 135, 179.9])
ax.set_ylabel("Latitude", fontsize=7)
ax.set_xlabel("Longitude", fontsize=7)
# Get scatter data
# ------------------
sc = ax.scatter(lon, lat,
c=obarray, s=4, linewidth=0,
transform=ccrs.PlateCarree(), cmap=cmap, vmin=cmin, vmax = cmax, norm=None, antialiased=True)
# Plot colorbar
# --------------
cbar = plt.colorbar(sc, ax=ax, orientation="horizontal", pad=.1, fraction=0.06,ticks=[200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310])
#cbar = plt.colorbar(sc, ax=ax, orientation="horizontal", pad=.1, fraction=0.06,ticks=[-3, -2.5, -2, -1.5, -1, -0.5, 0, 0.5, 1.0, 1.5, 2.0, 2.5, 3 ])
#cbar = plt.colorbar(sc, ax=ax, orientation="horizontal", pad=.1, fraction=0.06,ticks=[0, 10, 20, 20, 40, 50, 60, 70, 80, 90, 100])
cbar.ax.set_ylabel(units, fontsize=10)
# Plot globally
# --------------
#ax.set_global()
#ax.set_extent(conus)
ax.set_extent(conus_12km)
# Draw coastlines
# ----------------
ax.coastlines()
ax.text(0.45, -0.1, 'Longitude', transform=ax.transAxes, ha='left')
ax.text(-0.08, 0.4, 'Latitude', transform=ax.transAxes,
rotation='vertical', va='bottom')
#text = f"Total Count:{datcont:0.0f}, Max/Min/Mean/Std: {datma:0.3f}/{datmi:0.3f}/{omean:0.3f}/{stdev:0.3f} {units}"
#print(text)
#ax.text(0.67, -0.1, text, transform=ax.transAxes, va='bottom', fontsize=6.2)
dpi=100
gl.top_labels = False
plt.tight_layout()
# show plot
# -----------
# pname='test.png'
# plt.savefig(pname, dpi=dpi)
For reference, using netCDF4 to read jdiag files directly#
dataset=Dataset(jdiag_file, mode='r')
query_dataset(dataset)
EffectiveError0
airTemperature
EffectiveError1
airTemperature
EffectiveError2
airTemperature
EffectiveQC0
airTemperature
EffectiveQC1
airTemperature
EffectiveQC2
airTemperature
MetaData
stationIdentification, aircraftFlightPhase, timeOffset, prepbufrDataLvlCat, longitude, stationElevation, latitude, prepbufrReportType, pressure, longitude_latitude_pressure, dateTime, height, dumpReportType, aircraftFlightNumber
ObsBias0
airTemperature
ObsBias1
airTemperature
ObsBias2
airTemperature
ObsError
airTemperature, relativeHumidity, windEastward, pressure, windNorthward
ObsType
specificHumidity, windEastward, airTemperature, windNorthward
ObsValue
dewpointTemperature, specificHumidity, airTemperature, windEastward, windNorthward
QualityMarker
airTemperature, pressure, windEastward, specificHumidity, height, windNorthward
TempObsErrorData
airTemperature
hofx0
airTemperature
hofx1
airTemperature
hofx2
airTemperature
innov1
airTemperature
oman
airTemperature
ombg
airTemperature
Location
NOTE:#
We can see that jdiag files (or output ioda files) have the group/variable structure.
And we need to use, for example, obs.groups["oman"].variables["airTemperature"][:] to access data.
Wit the obsSpace class, we can access the data very easily and conveniently by using obs.t.oman