探空气球数据提取与三次样条插值
1.数据准备
1.1数据下载
在大气科学实验当中,我们经常会使用到探空气球数据。通过它我们可以获得温度、大气压力、湿度、风速风向等气象要素。有很多网站可以对探空数据进行下载。本文使用了NOAA提供的探空数据。数据访问|国家环境信息中心(NCEI)前身为国家气候数据中心 (noaa.gov)。
1.2数据分析
下载后,我们首先对探空数据进行一个简单的了解。从这幅图我们可以看到,探空数据的格式是txt格式的。AFM00040948是这个测站的名字。
接下来我们打开它,对内部数据进行分析。从这张图片可以看出,探空数据其实并不好,存在大量的-9999和-8888等缺省值。我们首先要做的是对这些-9999进行剔除,并生成新的文件。此外,一个原始文件包含了很多个单独的数据,这张图上展示的就是#AEM00040948站点在2018年1月1日0点23分00秒时观测到的数据,一般来讲,一个探空测站一天会有两个数据记录,一个是晚上,一个是中午。在这幅图的下面,还存在很多这样的数据。我们为了方便处理,可以将每一个数据生成单独的文件。这样,无论是在后续科研或者数据处理时都有较好的帮助。
对于每一列数据表示的含义,我们可以根据官网给的说明文件获得相应的消息。
Data Record Format:
---------------------
-------------------------------
Variable Columns Type
-------------------------------
LVLTYP1 1- 1 Integer
LVLTYP2 2- 2 Integer
ETIME 4- 8 Integer
PRESS 10- 15 Integer
PFLAG 16- 16 Character
GPH 17- 21 Integer
ZFLAG 22- 22 Character
TEMP 23- 27 Integer
TFLAG 28- 28 Character
RH 29- 33 Integer
DPDP 35- 39 Integer
WDIR 41- 45 Integer
WSPD 47- 51 Integer
-------------------------------
These variables have the following definitions:
LVLÞTYP1 is the major level type indicator. It has the following
three possible values:
1 = Standard pressure level (for levels at 1000, 925, 850,
700, 500, 400, 300, 250, 200, 150, 100, 70, 50, 30,
20, 10, 7, 5, 3, 2, and 1 hPa)
2 = Other pressure level
3 = Non-pressure level
LVLÞTYP2 is the minor level type indicator. It has the following
three possible values:
1 = Surface
2 = Tropopause
0 = Other
ETIME is the elapsed time since launch. The format is MMMSS, where
MMM represents minutes and SS represents seconds, though
values are not left-padded with zeros. The following special
values are used:
-8888 = Value removed by IGRA quality assurance, but valid
data remain at the same level.
-9999 = Value missing prior to quality assurance.
PRESS is the reported pressure (Pa or mb * 100, e.g.,
100000 = 1000 hPa or 1000 mb). -9999 = missing.
PFLAG is the pressure processing flag indicating what level of
climatology-based quality assurance checks were applied. It
has three possible values:
blank = Not checked by any climatology checks. If data value
not equal to -9999, it passed all other applicable
checks.
A = Value falls within "tier-1" climatological limits
based on all days of the year and all times of day
at the station, but not checked by
"tier-2" climatology checks due to
insufficient data.
B = Value passes checks based on both the tier-1
climatology and a "tier-2" climatology specific to
the time of year and time of day of the data value.
GPH is the reported geopotential height (meters above sea level).
This value is often not available at variable-pressure levels.
The following special values are used:
-8888 = Value removed by IGRA quality assurance, but valid
data remain at the same level.
-9999 = Value missing prior to quality assurance.
ZFLAG is the geopotential height processing flag indicating what
level of climatology-based quality assurance checks were
applied. It has three possible values:
blank = Not checked by any climatology checks or flag not
applicable. If data value not equal to -8888 or -9999,
it passed all other applicable checks.
A = Value falls within "tier-1" climatological limits
based on all days of the year and all times of day
at the station, but not checked by
"tier-2" climatology checks due to insufficient data.
B = Value passes checks based on both the tier-1
climatology and a "tier-2" climatology specific to
the time of year and time of day of the data value.
TEMP is the reported temperature (degrees C to tenths, e.g.,
11 = 1.1°C). The following special values are used:
-8888 = Value removed by IGRA quality assurance, but valid
data remain at the same level.
-9999 = Value missing prior to quality assurance.
TFLAG is the temperature processing flag indicating what
level of climatology-based quality assurance checks were
applied. It has three possible values:
blank = Not checked by any climatology checks or flag not
applicable. If data value not equal to -8888 or -9999,
it passed all other applicable checks.
A = Value falls within "tier-1" climatological limits
based on all days of the year and all times of day
at the station, but not checked by "tier-2"
climatology checks due to insufficient data.
B = Value passes checks based on both the tier-1
climatology and a "tier-2" climatology specific to
the time of year and time of day of the data value.
RH is the reported relative humidity (Percent to tenths, e.g.,
11 = 1.1%). The following special values are used:
-8888 = Value removed by IGRA quality assurance, but valid
data remain at the same level.
-9999 = Value missing prior to quality assurance.
DPDP is the reported dewpoint depression (degrees C to tenths, e.g.,
11 = 1.1°C). The following special values are used:
-8888 = Value removed by IGRA quality assurance, but valid
data remain at the same level.
-9999 = Value missing prior to quality assurance.
WDIR is the reported wind direction (degrees from north,
90 = east). The following special values are used:
-8888 = Value removed by IGRA quality assurance, but valid
data remain at the same level.
-9999 = Value missing prior to quality assurance.
WSPD is the reported wind speed (meters per second to tenths, e.g.,
11 = 1.1 m/s). The following special values are used:
-8888 = Value removed by IGRA quality assurance, but valid
data remain at the same level.
-9999 = Value missing prior to quality assurance.
除此之外,我们发现在下载的txt文件中并没有包含测站的经纬度信息。我们知道,在实验中,对于经纬度这种时空特征信息,也是比较重要的。为此,通过在官网查找,可以发现一个igra2-station-list.txt的文件。这个文件里面就包含这些测站的经纬度等信息。
2.数据处理
通过上面的分析。我们已经对探空数据有了一个初步的认知,对于数据的处理也有了一定的认识。接下来我们就借助编程语言和相关的ide工具对其进行处理。
在下面的操作中,我们需要win或者mac电脑一台。ide选择上,需要IDEA和Matlab。idea和Matlab的安装,可以去百度或者 lolxiaoguo.cn进行查看。
2.1maven工程创建
我们首先在idea中创建一个maven工程,名字可以自己取,我这里就取了roData这个名字。然后在java文件夹下分别创建了domain和utils两个文件夹,其中domain用来存放bean类,utils存放工具类。
2.2缺省值剔除
在第一部分中我们了解到,探空数据存在较多的缺省值,所以我们首要任务是缺省值的剔除。我们首先在utils中创建一个correctRaDataUtil工具类,该工具类实现代码如下:
package roDataDemo.utils;
import java.io.*;
import java.util.ArrayList;
import java.util.List;
public class correctRaDataUtil {
/**
* @param fileName 要处理的文件名
* @return 返回探空气球的标志#的所在行
*/
public static List<Integer> getData(String fileName) {
//首先读取源文件,将数据读出来
File file = new File(fileName);
try {
if (file.isFile() && file.exists()) {
InputStreamReader read;
read = new InputStreamReader(new FileInputStream(file));
BufferedReader bufferedReader = new BufferedReader(read);
String lineTxt;
int i = 0;
List<Integer> mark = new ArrayList<Integer>();
//先找出#所在的行
while ((lineTxt = bufferedReader.readLine()) != null) {
i = i + 1;
if (lineTxt.contains("#")) {
mark.add(i);
}
}
read.close();
return mark;
}
} catch (Exception ignored) {
}
return null;
}
/**
* @param fileName 要处理的文件名
* @param destinationFolder 生成的文件的输出位置
*/
public static void create(String fileName, String destinationFolder) {
List<Integer> index = getData(fileName);
try {
FileReader in = new FileReader(fileName);
LineNumberReader reader = new LineNumberReader(in);
reader.skip(Long.MAX_VALUE);
int lines = reader.getLineNumber();
File file = new File(fileName);
if (file.isFile() && file.exists()) { //判断文件是否存在
InputStreamReader read = new InputStreamReader(new FileInputStream(file));//考虑到编码格式
BufferedReader bufferedReader = new BufferedReader(read);
String lineTxt;
for (int i = 0; i < index.size(); i++) {
ArrayList<String> strings = new ArrayList<String>();
int j = 0;
if (i < (index.size() - 1)) {
for (int i1 = index.get(i); i1 < index.get(i + 1); i1++) {
lineTxt = bufferedReader.readLine();
//对lineTxt进行判断,如果
j = j + 1;
if (j == 1) {
strings.add(lineTxt);
}
if (j != 1 && !lineTxt.startsWith("9999", 17)
&& !(lineTxt.substring(16, 29).contains("8888"))
&& !lineTxt.startsWith("9999", 23)) {
String b = lineTxt.substring(16, 27).replace("B", "");
String c = b.replace("A", "");
strings.add(c);
}
}
}
if (i == index.size() - 1) {
for (int i1 = index.get(i); i1 < lines + 1; i1++) {
lineTxt = bufferedReader.readLine();
j = j + 1;
if (j == 1) {
strings.add(lineTxt);
}
if (j != 1 && !lineTxt.startsWith("9999", 17)
&& !(lineTxt.substring(16, 29).contains("8888"))
&& !lineTxt.startsWith("9999", 23)
) {
String b = lineTxt.substring(16, 27).replace("B", "");
String c = b.replace("A", "");
strings.add(c);
}
}
}
BufferedWriter bw = new BufferedWriter(new FileWriter
(destinationFolder + "/" + i + ".txt"));
for (String string : strings) {
bw.write(string);
bw.newLine();
bw.flush();
}
bw.close();
}
read.close();
}
} catch (Exception ignored) {
}
}
}
该工具类在使用时,只需要在外部直接调用correctRaDataUtil.create(a,b)方法,即可将探空数据中缺省值去除,并在目标文件夹b中生成新的文件。
2.3 探空数据的三次样条插值
在上面的处理中,我们完成了对探空数据的第一次处理。得到了没有缺省值的数据。但是有时根据实验的需要,我们还要对数据进行相应的插值处理,该部分以三次样条插值为例,对我们2.1生成的数据进行相应处理。在该部分的处理中,我们选择使用Matlab工具,至于为啥使用Matlab,那是因为Matlab中有一个函数可以直接进行三次样条插值,比较方便,哈哈哈哈。首先我们打开matlab,然后新建脚本,脚本名字自己按照自己的喜好取个名字。然后使用下面的代码对探空数据进行三次样条插值,并生成新的txt文件。此时生成txt文件,我们将其称为插值后标准文件,简称标准文件。
clear
clc
%输入文件夹
fileFolder=fullfile('/Users/guojiabin/Desktop/test/生成文件');
%输出文件夹
outputFolder=fullfile('/Users/guojiabin/Desktop/test/标准文件/');
dirOutput=dir(fullfile(fileFolder,'*.txt'));
fileNames={dirOutput.name};
for i=1:length(fileNames)
fin=fopen(char(fileNames(i)),'r');
stationInformation=fgetl(fin);
sb=strcat(stationInformation,'\n');
s=char(fileNames(i));
a=textread(char(strcat(fileFolder,'/',fileNames(i))),'','headerlines',1);
alt1=a(:,1);
alt=alt1/1000;
temp1=a(:,2);
temp=temp1/10+273.15;
trAlt=0:0.1:30;
%进行三次样条插值
trTemp=spline(alt,temp,trAlt);
data=[trAlt;trTemp];
fileID=fopen(strcat(outputFolder,s),'w');
fprintf(fileID,sb,'\n');
fprintf(fileID,'%.1f %.2f\n',data);%输出集合数组data
end
2.4 标准文件的处理与txtData类的创建
上文中,我们完成了对探空数据的三次样条插值,并生成了相对应的文件。其实到这里,基本上就完成了对探空数据的处理,得到的新文件已经可以满足大部分实验的需求,如果需要批量处理相关文件,只需要在循环处调用相应的工具类即可。而该部分是通过使用面向对象的思想,将标准文件变得更加好用。
在处理数据之前,我们首先创建一个txtData的bean类,篇幅有限。这里省略了相应的get与set方法。
public class txtData {
private Integer year;
private Integer month;
private Integer day;
private Integer hour;
private Integer minute;
private Integer second;
private Double lat;
private Double lon;
private List<Double> alt;
private List<Double> temp;
}
我们知道,大气资料最重要的就是它的时间特征,空间特征,以及数据本身。所以在txtData类中,我们分别给了年月日时分秒,经纬度,以及海拔温度等参数。有了这个类,我们就可以面向txtData进行编程了。这样有一个好处,那就是不论以后是探空气球数据,大气掩星数据,再分析资料数据,还是其他数据,我们都可以面向txtData进行编程处理。
该部分,我们就以上述生成的探空标准文件为例,进行相应的数据处理。我们首先创建一个工具类readRaStandTxtUtil,意思就是读取探空标准数据工具类。
package roDataDemo.utils;
import lombok.extern.slf4j.Slf4j;
import roDataDemo.domain.igra2Station;
import roDataDemo.domain.txtData;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.InputStreamReader;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import static java.lang.Double.parseDouble;
/**
* 完成于2021-8-4-10:24
* 之前使用Matlab将ra数据提取出存到了txt格式的文件里面。
* 本工具类就是将txt文件读进来,将一个txt的数据存入到一个txtData对象里面。
* 这里面的try,catch太多了,肯定是可以精简的,但是,我这会懒得搞,如果有大哥帮忙指出,万分感谢
*/
@Slf4j
public class readRaStandTxtUtil {
/**
* @param fileName 将要读的txt的文件名
* @return 返回了一个txtData对象,包含掩星的经度,纬度,发生时间,以及经过三次样条插值过后的海拔,以及温度数据
*/
public static txtData getTxtData(String fileName, String stationListName) {
File file = new File(fileName);
List<igra2Station> igra2StationList = getIgra2StationList(stationListName);
txtData txtData = new txtData();
try {
if (file.isFile() && file.exists()) {
InputStreamReader read;
read = new InputStreamReader(new FileInputStream(file));
BufferedReader bufferedReader = new BufferedReader(read);
String lineTxt;
int i = 0;
ArrayList<Double> altList = new ArrayList<Double>();
ArrayList<Double> tempList = new ArrayList<Double>();
while ((lineTxt = bufferedReader.readLine()) != null) {
//业务代码,如果lineTxt的长度大于多少,就记录
i++;
if (i == 1) {
//这里还有待优化,因为把格式定死了
int intYear = Integer.parseInt(lineTxt.substring(12, 18).trim());
//做的好一点,判断这个值是否存在
int intMonth = Integer.parseInt(lineTxt.substring(18, 21).trim());
int intDay = Integer.parseInt(lineTxt.substring(21, 24).trim());
int intHour = Integer.parseInt(lineTxt.substring(24, 27).trim());
int intMinute = Integer.parseInt(lineTxt.substring(27, 29).trim());
int intSecond = Integer.parseInt(lineTxt.substring(29, 32).trim());
txtData.setYear(intYear);
txtData.setMonth(intMonth);
txtData.setDay(intDay);
txtData.setHour(intHour);
txtData.setMinute(intMinute);
txtData.setSecond(intSecond);
String station = lineTxt.substring(1, 12);
assert igra2StationList != null;
for (igra2Station igra2Station : igra2StationList) {
if (igra2Station.getName().equals(station)) {
txtData.setLat(igra2Station.getLat());
txtData.setLon(igra2Station.getLon());
}
}
}
if (i > 1) {
String alt = lineTxt.substring(0, lineTxt.indexOf(" ") + 1);
String temp = lineTxt.substring(lineTxt.indexOf(" ") + 1);
double doubleAlt = parseDouble(alt.trim());
double doubleTemp = parseDouble(temp.trim());
altList.add(doubleAlt);
tempList.add(doubleTemp);
}
}
txtData.setAlt(altList);
txtData.setTemp(tempList);
read.close();
return txtData;
}
} catch (Exception ignored) {
}
System.out.println("没数据");
return null;
}
/**
* @param fileName 存放探空气球站点数据的txt文档,可以写死。
* @return 返回一个存放站点数据类的集合
*/
public static List<igra2Station> getIgra2StationList(String fileName) {
File file = new File(fileName);
try {
if (file.isFile() && file.exists()) {
InputStreamReader read;
read = new InputStreamReader(new FileInputStream(file));
BufferedReader bufferedReader = new BufferedReader(read);
String lineTxt;
ArrayList<igra2Station> stationList = new ArrayList<igra2Station>();
while ((lineTxt = bufferedReader.readLine()) != null) {
igra2Station igra2Station = new igra2Station();
igra2Station.setName(lineTxt.substring(0, 11));
igra2Station.setLat(parseDouble(lineTxt.substring(12, 18)));
igra2Station.setLon(parseDouble(lineTxt.substring(21, 28)));
stationList.add(igra2Station);
}
return stationList;
}
} catch (Exception ignored) {
}
return null;
}
}
需要注意的是,在该工具类中,有两个静态方法,这里需要对第二个静态方法getIgra2StationList进行相关解释,在1.2的数据分析中,我们提到过在探空数据中并没有直接的给出测站的经纬度信息,所以我们在生成txtData时,就会缺少相关经纬度信息。为了解决这个问题,我们采用的处理办法是,我们新建了一个igra2Station的bean类。
public class igra2Station {
private String name;
private Double lat;
private Double lon;
}
这个类很简单,包含了测站名,测站经纬度三个信息。我们在getIgra2StationList中对igra2-station-list.txt进行了相关读取,并返回了包含所有测站信息的一个List
3.工程地址
为了方便小伙伴们进行重复实验操作,我将该工程上传到了gitee上(github访问速度实在太慢了)。下载工程请点我!!!
4.展望一波
不知道有没有小伙伴会做桌面软件的,我们可以合作把这些程序打包,做成可视化的工具,这样就能让其他小伙伴减少学习成本,能更好的对数据进行相关处理,减少不必要的时间。如果有,可以联系我,让我们做大做强,再创辉煌,哈哈哈哈。
下一篇我们将会对大气掩星数据进行相关处理,并最终也生成txtData对象。敬请期待。
版权属于: 依依东望