本篇文章是对
wbopendata
命令的详细介绍,旨在为更好开展跨国比较分析提供技术支持。
对全球发展和不断衍生出的全球问题而言,获取世界各国的数据资料似乎从未像今天这样重要过。联合国、世界银行和世界卫生组织等全球各大公共机构提供了海量数据供用户使用。其中,世界银行作为影响全球发展的核心机构之一,建立了“世行公开数据库”(World Bank Open Databases),整合了1960年以来覆盖256个国家或地区的各类数据,具体包含:世界发展指标(World Development Indicators)、全球发展融资(Global Development Finance)、非洲发展指标(Global Development Finance)、营商环境(Doing Business)、教育统计(Education Statistics)、企业调查(Enterprise Surveys)、性别统计(Gender Statistics)、健康营养和人口统计(Health Nutrition and Population Statistics)、千年发展目标(Millennium Development Goals, MDG)(注,MDG 现在已经被可持续发展目标代替)以及全球治理指标(Worldwide Governance Indicators)等诸多重要的数据资源。显然,作为用户,面对浩瀚如烟的数据,如何准确获取我们感兴趣的变量并快速实现数据分析便显得尤为重要。幸运的是,世行的开发人员为我们提供了一个能够在 Stata 中通过连接“世行公开数据库API”的方式实现数据调用的第三方命令,即 wbopendata
。
wbopendata
在 Stata 中的应用有两种方式:界面勾选(傻瓜方法)和代码命令(敲代码)。基于此,可以直接访问最新版本的世行数据,且无需进行数据下载和文件管理,极大改优化了数据分析的中间过程。目前最新版本的 API 程序涵盖88个数据来源、21个主题的近2万个变量。不仅如此,通过便捷的选项设定,利用该命令获取的指定数据集正是在 Stata 中开展面板数据分析时需要的数据呈现形式。 接下来,我们将先对该命令的优势进行介绍,紧接着对两种调用方式进行介绍,最后通过具体的示例展示这种调用方法的应用潜力。
为了说明传统数据收集方式存在的局限,以世行公开数据库中的世界发展指标为例进行说明。“世界发展指标”是关于全球发展和抗击贫困方面的高质量、具有国际可比性的统计数据汇编。该数据库包含了217个经济体和40多个国家组别的1400个时间序列指标,其中许多指标的数据可以追溯到50多年前。WDI 所涵盖的主题主要有:
(1)贫困与不平等(poverty, prosperity, consumption, income distribution)
(2)人口发展(population dynamics, education, labor, health, gender)
(3)环境(agriculture, climate change, energy, biodiversity, water, sanitation)
(4)经济(growth, economic structure, income and savings, trade, labor productivity)
(5)政府与市场(business, stock markets, military, communications, transport, technology)
(6)全球连接(debt, trade, aid dependency, refugee, tourism, migration)
该数据库是世界银行各大旗舰报告的主要数据来源。不仅如此,世行还基于WDI开发了《可持续发展地图集》(Atlas of Sustainable Development Goals),将17个可持续发展目标进行可视化,以讲故事的方式实现交互式在线游览。
我们先在 Stata 中打开世界银行网站中有关 WDI 的页面:
view browse "https://datatopics.worldbank.org/world-development-indicators/"
对 WDI 的内容有所掌握之后,开始下载希望用于分析的数据,世界银行开发了一个叫做数据银行(databank)的大型数据仓库,其中包含了 WDI 以及其他数据库,我们可以在其中找到 WDI并下载需要的数据。同样,打开下面的网页,进入如下图所示的数据选择页面。
view browse "https://databank.shihang.org/source/world-development-indicators#"
我们将下载好的原始数据打开,可以看到国家、系列和时间等变量(如下图)。
原始数据主要存在如下三个问题:
(1)年份数据以列为分隔,每年一列,需要将每一列的年份整合到一个年份变量中。需要说明的是,从数据结构上看,上图所示的原始数据也被称为“宽数据”(wide data)。在 Stata 中,我们需要将宽数据转换为“长数据”(long data)。
(2)数据集中用“..”(two periods)表示缺失值(missing values),但 Stata 进能够识别“.”(a single period)。如果强行读取数据,每一列将被视为字符型变量(string values)而非数值型变量(numeric values)。因此,导入 Stata 后无法进行对其进行直接计算,需要对其展开进一步处理。
(3)在原始数据表中,其尾部若干行会显示注释信息等内容,强行读入将被视为观测值。
由上可知,在 Stata 中将世行网站的原始数据用于统计分析需要一定的中间处理过程。解决上述问题的办法有两个,其一,对原始数据进行手动数据。这种方法较为严谨,且能够留下数据处理的痕迹,缺点是不够方便,由于数据库包含的主题广泛,以至于对不同主题进行分析时,需要针对特定主题重新下载数据集。其二,使用第三方开发的 API 模块,直接获得世行数据库中特定的数据集。这种方法最大的好处之一就是足够方便。但是,该方法也可能存在连接数据滞后、处理过程不可控等问题。
两种方法均旨在获得长面板数据,本技术文档的主要工作是对上述两种方法的具体过程进行详细说明。不仅如此,为了学以致用,我们还将使用这些数据绘制图形以及制作描述性统计表格。
db wbopendata
wbopendata
是面向 Stata 用户开发的第三方应用模块,允许用户通过 Stata 内的 API 实时获取数据。这类第三方应用由 The World Bank’s Open Data Initiative 于2010年4月20日发起,目前包含了 World Development Indicators, Africa Development Indicators, Global Finance Indicators, 以及 Doing Business Indicators 等数据库。
ssc install wbopendata
db wbopendata
运行上述命令后,弹出如下所示的可选择界面。
如果在 “Country - WDI or All series” 中选择一个国家或多个国家/地区,将获得这些国家/地区的所有变量。或者,可以在“Indicators - All series”中选择一个变量,可以获取所有国家/地区(若未选择任何国家)或选择的任何国家/地区的变量。更为常见的情况可能是:需要所有国家/地区关于同一主题的变量。这时可选择“Topics -- WDI series”下的主题,例如“3 - Economy & Growth”。世行提供的各主题下的指标介绍可参考:https://data.worldbank.org/Indicator?tab=all.
需要说明的是,在进行选择时,Country 和 Indicatrors 之间可以是“ and/ or ”,但 “Topics”与前两者不同时选择。换言之,当选择 ”Topics“时,我们无法进一步指定国家和(或)指标。
选择数据后,勾选“Import the data in long”和“Replace data in memory”。前者可以得到转换后的“长”数据,后者表示清空内存中的数据后载入勾选的数据。
示例:
Country - CHN;FRA;JPN;USA;IND;
Indicators - SP.POP.0014.TO.ZS - Population ages 0-14 (% of total population)
Format Option - 两个框均勾选
“提交”后,Stata 中载入的数据如下图所示:
事实上,如同在 Stata 中使用内嵌于界面的各个分析选项一样,wbopendata
实质上也是一些列封装好的命令。例如,上面通过界面勾选获得的数据,也可以通过下方的命令实现:
wbopendata, language(en - English) country(CHN;FRA;JPN;USA;IND;) topics() indicator(SP.POP.0014.TO.ZS - Population ages 0-14 (% of total population)) clear long
wbopendata, country(chn - China) clear long //这里以我国为例;但是,可以指定多个具体的国家。
wbopendata,country(chn - China) indicator(SP.POP.0014.TO.ZS - Population ages 0-14 (% of total population)) clear long
wbopendata, indicator(SP.POP.0014.TO.ZS - Population ages 0-14 (% of total population)) clear long
wbopendata, topics(2 - Aid Effectiveness) clear long
可以注意到,在各种情况的组合中,没有“全部国家(或地区)、全部变量”的情形。显然,这种设计具有合理性,这是因为在数据分析中,我们研究的问题往往是针对特定领域展开的。当然,若需要完整的数据集(full dataset),可直接在世行网站中下载。如果在界面中不做任何选择,Stata 的结果窗口中将显示如下内容:
*Users need to select either a country, a topic, or an indicator. Please try again.
以上使用界面勾选方式获取的数据无法设定年份。虽然无法勾选特定年份,但我们仍然可以通过代码方式实现。
例如,在“2.3.3”中进一步指定获取 2000 至 2010 年的数据,即 year(2000:2010)
。修改后的代码如下:
wbopendata, country(CHN;FRA;JPN;USA;IND;) indicator(SP.POP.0014.TO.ZS - Population ages 0-14 (% of total population)) year(2000:2010) clear long
wbopendata
命令的语法结构 通过以上内容可知,界面操作虽然便捷快速,但难以覆盖wbopendata
命令的所有功能(例如,指定具体的年份)。因此,更为明智的学习策略是在“傻瓜操作”的体验之上通过命令代码的方式实现我们的目的。这样做的好处在于,我们可以更为清楚地管理整个过程,便于同合作者以及读者们分享该过程。wbopendata
的语法结构如下:
* wbopendata, Parameters [Options]
其中,参数(Parameters)主要包含如下内容:
/* Parameters Description
---------------------------------------------------------------------------------------------------------
country(acronym) list of country code (accepts multiples)
(and/or)
indicator(acronym) list of indicator code(accepts multiples)
(or)
topics(acronym) topic code (only accepts one) */
选项(Options)包括如下内容:
/* Options Description
---------------------------------------------------------------------------------------------------------
long imports the data in the long format.
clear replace data in memory.
latest keep only the latest available value of a single indicator.
nometadata omits the display of the metadata.
year(date1:date2) time interval (in yearly, quarterly or monthly depending on the series).
language(language) select the language.
full adds full list of country attributes.
iso adds 2 digits ISO codes to country attributes.
update query query the current vintage of indicators and country metadata available.
update check checks the availability of new indicators and country metadata available for download.
update all refreshes the indicators and country metadata information.
match(varname) mergue country attributes using WDI countrycodes.
projection World Bank population estimates and projections (HPP) .
metadataoffline download all indicator metadata informaiton and generates 71 sthlp files in your local machine. */
wbopendata
命令 如上所述,wbopendata
命令实质上是调用了第三方开发的 API 程序。由于数据不断更新,API 也自然随之更新,需要我们在使用中不断更新已经储存在电脑中的数据信息。以下内容可以作为维护日常使用的常规工作。
wbopendata, update query //查询当前版本,获得如下信息
/*
Indicators update status
Existing Number of Indicators: 17473
Last check for updates: 8 Jul 2020 13:56:14
New update available: none (as of 8 Jul 2020 13:56:14)
Current update level: 8 Jul 2020 13:56:14
Country metadata: 304
Last country check: 8 Jul 2020 13:56:14
Current country update level: 8 Jul 2020 14:01:16
Possible actions
Check for available updates (or type -wbopendata, update check detail -)
See current documentation on indicators list, Regions,
Administrative Regions, Income Levels, and Lending Types
*/
wbopendata, update check detail // 进一步查询可更新的具体内容
/*
Indicators update status
Existing Number of Indicators: 17473
Last check for updates: 8 Jul 2020 13:56:14
New update available: yes (as of 15 May 2022 17:02:15}
Current update level: 8 Jul 2020 13:56:14
Country metadata: 304
New update available: yes (as of 15 May 2022 17:02:15}
Last country check: 8 Jul 2020 13:56:14
Current country update level: 8 Jul 2020 14:01:16
Downloading indicators list 1/3...
Downloading indicators list 1/3...COMPLETED!
Downloading indicators list 2/3...
Downloading indicators list 2/3...COMPLETED!
Downloading indicators list 3/3...
Downloading indicators list 3/3...COMPLETED!
Preparing indicator data for check...
Preparing indicator data for check...COMPLETED!
-----------------------------------------------------------------------------------------------------------------
Source Number of indicators
-----------------------------------------------------------------------------------------------------------------
01 Doing Business 201 (SAME)
02 World Development Indicators 1429 (CHANGED) old value: 1422
03 Worldwide Governance Indicators 30 (CHANGED) old value: 36
05 Subnational Malnutrition Database 5 (SAME)
11 Africa Development Indicators 831 (CHANGED) old value: 838
12 Education Statistics 4230 (CHANGED) old value: 3622
13 Enterprise Surveys 115 (CHANGED) old value: 89
14 Gender Statistics 378 (CHANGED) old value: 299
15 Global Economic Monitor 37 (CHANGED) old value: 38
16 Health Nutrition and Population Sta... 141 (CHANGED) old value: 115
18 IDA Results Measurement System 1 (SAME)
19 Millennium Development Goals 20 (SAME)
20 Quarterly Public Sector Debt 564 (SAME)
22 Quarterly External Debt Statistics ... 1800 (SAME)
23 Quarterly External Debt Statistics ... 256 (SAME)
24 Poverty and Equity 43 (CHANGED) old value: 39
25 Jobs 3 (CHANGED) old value: 1
27 Global Economic Prospects 1 (SAME)
28 Global Financial Inclusion 776 (SAME)
29 The Atlas of Social Protection - In... 2714 (CHANGED) old value: 2809
30 Exporter Dynamics Database – Indi... 98 (SAME)
32 Global Financial Development 114 (CHANGED) old value: 111
33 G20 Financial Inclusion Indicators 131 (SAME)
34 Global Partnership for Education 678 (SAME)
35 Sustainable Energy for All 11 (SAME)
36 Statistical Capacity Indicators 25 (SAME)
37 LAC Equity Lab 211 (SAME)
39 Health Nutrition and Population Sta... 420 (SAME)
40 Population estimates and projection... 187 (CHANGED) old value: 175
41 Country Partnership Strategy for In... 185 (SAME)
43 Adjusted Net Savings 2 (SAME)
45 Indonesia Database for Policy and E... 261 (SAME)
46 Sustainable Development Goals 2 (SAME)
50 Subnational Population 1 (SAME)
54 Joint External Debt Hub 28 (SAME)
57 WDI Database Archives 842 (CHANGED) old value: 866
59 Wealth Accounts 52 (CHANGED) old value: 54
60 Economic Fitness 2 (SAME)
61 PPPs Regulatory Quality 4 (SAME)
63 Human Capital Index 27 (SAME)
64 Worldwide Bureaucracy Indicators 192 (CHANGED) old value: 87
65 Health Equity and Financial Protect... 315 (SAME)
66 Logistics Performance Index 12 (SAME)
67 PEFA 2011 107 (SAME)
69 Global Financial Inclusion and Cons... 347 (SAME)
70 Economic Fitness 2 2 (SAME)
71 International Comparison Program [I... 36 (CHANGED) old value: 17
73 Global Financial Inclusion and Cons... 19 (SAME)
75 Environment, Social and Governance ... 4 (SAME)
78 ICP 2017 47 (SAME)
80 Gender Disaggregated Labor Database... 6 (CHANGED) old value: 0
81 International Debt Statistics - DS... 525 (CHANGED) old value: 0
82 Global Public Procurement 111 (CHANGED) old value: 0
83 Statistical Performance Indicators ... 119 (CHANGED) old value: 0
84 Education Policy 469 (CHANGED) old value: 0
86 Global Jobs Indicators Database [JO... 746 (CHANGED) old value: 0
87 Country Climate and Development Rep... 250 (CHANGED) old value: 0
88 Food Prices for Nutrition 21 (CHANGED) old value: 0
-----------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------
Topics Number of indicators
-----------------------------------------------------------------------------------------------------------------
01 Agriculture and Rural Development 49 (SAME)
02 Aid Effectiveness 77 (CHANGED) old value: 73
03 Economy and Growth 306 (CHANGED) old value: 297
04 Education 1014 (CHANGED) old value: 1476
05 Energy and Mining 53 (SAME)
06 Environment 145 (CHANGED) old value: 139
07 Financial Sector 188 (CHANGED) old value: 195
08 Health 651 (CHANGED) old value: 637
09 Infrastructure 76 (CHANGED) old value: 75
10 Social Protection and Labor 2149 (CHANGED) old value: 2214
11 Poverty 145 (SAME)
12 Private Sector 196 (CHANGED) old value: 190
13 Public Sector 109 (CHANGED) old value: 100
14 Science and Technology 13 (SAME)
15 Social Development 35 (SAME)
16 Urban Development 27 (SAME)
17 Gender 311 (CHANGED) old value: 323
18 Millenium development goals 26 (SAME)
19 Climate Change 82 (SAME)
20 External Debt 516 (SAME)
21 Trade 155 (CHANGED) old value: 152
{opt } 2 (CHANGED) old value: 0
-----------------------------------------------------------------------------------------------------------------
Possible actions
Download available updates (or type -wbopendata, update all-)
See current documentation on indicators list, Regions,
Administrative Regions, Income Levels, and Lending Types
*/
wbopendata, update all //执行更新命令
/*
Indicators update status
Existing Number of Indicators: 17473
New Number of Indicators: 20123
Last check for updates: 15 May 2022 17:02:15
New update available: yes (as of 15 May 2022 17:14:04}
Current update level: 8 Jul 2020 13:56:14\
Country metadata: 304
New country metadata: 299
Last country check: 15 May 2022 17:02:15
Current country update level: 8 Jul 2020 14:01:16
UPDATE IN PROGRESS...
Downloading country metadata...
Downloading country metadata... COMPLETED!
Processing country metadata...
Processing country metadata... COMPLETED!
Processing country documentation...
See Region
See Administrative Region
See Income Level
See Lending Type
Processing country documentation... COMPLETED!
FULL UPDATE COMPLETED.
*/
世界银行的数据库包罗万象,查询更新的结果显示共有17473(Existing Number of Indicators)个变量。显然,我们难以在使用中通过记忆去调用各主题下的众多指标,快速查询数据库中不同数据来源和各类主题下的指标便显得尤为重要。通过下面的命令,我们可以获得最新的指标来源和主题分类信息,这些信息以帮助文档的形式存储在本地,实际使用中可以随时调用查询,从而方便我们在庞大的数据库中迅速定位感兴趣的变量。
wbopendata, metadataoffline
*不再展示全部反馈结果
其中,能够获得数据来源编码(Source Code)如下:
Source Code Description
---------------------------------------------------------------------------------------------------------
01 Doing Business
02 World Development Indicators
03 Worldwide Governance Indicators
05 Subnational Malnutrition Database
11 Africa Development Indicators
12 Education Statistics
13 Enterprise Surveys
14 Gender Statistics
15 Global Economic Monitor
16 Health Nutrition and Population Statistics
18 IDA Results Measurement System
19 Millennium Development Goals
20 Quarterly Public Sector Debt
22 Quarterly External Debt Statistics SDDS
23 Quarterly External Debt Statistics GDDS
24 Poverty and Equity
25 Jobs
27 Global Economic Prospects
28 Global Financial Inclusion
29 The Atlas of Social Protection: Indicators of Resilience and Equity
30 Exporter Dynamics Database – Indicators at Country-Year Level
32 Global Financial Development
33 G20 Financial Inclusion Indicators
34 Global Partnership for Education
35 Sustainable Energy for All
36 Statistical Capacity Indicators
37 LAC Equity Lab
39 Health Nutrition and Population Statistics by Wealth Quintile
40 Population estimates and projections
41 Country Partnership Strategy for India (FY2013 - 17)
43 Adjusted Net Savings
45 Indonesia Database for Policy and Economic Research
46 Sustainable Development Goals
50 Subnational Population
54 Joint External Debt Hub
57 WDI Database Archives
59 Wealth Accounts
60 Economic Fitness
61 PPPs Regulatory Quality
63 Human Capital Index
64 Worldwide Bureaucracy Indicators
65 Health Equity and Financial Protection Indicators
66 Logistics Performance Index
67 PEFA 2011
69 Global Financial Inclusion and Consumer Protection Survey
70 Economic Fitness 2
71 International Comparison Program (ICP) 2005
73 Global Financial Inclusion and Consumer Protection Survey (Internal)
75 Environment, Social and Governance (ESG) Data
78 ICP 2017
80 Gender Disaggregated Labor Database (GDLD)
81 International Debt Statistics: DSSI
82 Global Public Procurement
83 Statistical Performance Indicators (SPI)
84 Education Policy
86 Global Jobs Indicators Database (JOIN)
87 Country Climate and Development Report (CCDR)
88 Food Prices for Nutrition
此外,还能获得主题编码(Topics Code)如下:
Topics Code Description
---------------------------------------------------------------------------------------------------------
01 Agriculture and Rural Development
02 Aid Effectiveness
03 Economy and Growth
04 Education
05 Energy and Mining
06 Environment
07 Financial Sector
08 Health
09 Infrastructure
10 Social Protection and Labor
11 Poverty
12 Private Sector
13 Public Sector
14 Science and Technology
15 Social Development
16 Urban Development
17 Gender
18 Millenium development goals
19 Climate Change
20 External Debt
21 Trade
实际应用中,我们需要快速查询特定数据来源和主题分类的信息。这时,可以使用 help ***
help wbopendata_sourceid_indicators## // ## 表示来源编号(Source ID): 01-88
help wbopendata_topicid_indicators## // ## 表示主题编号(Topic ID): 01-21
wbopendata
的应用示例 获取数据的目的是对其进行分析。在 Stata 中调用数据后,我们既可以对获得的截面数据或面板数据进行描述性统计分析,亦可开展更为深入的推断统计分析。这里,我们以“15-64岁女性劳动参与率”为例,通过 wbopendata
获取数据并将其在地图上进行绘制,以此观测该变量的全球分布状况。
首先,通过上面介绍的帮助文档,获取“15-64岁女性劳动参与率(SL.TLF.ACTI.FE.ZS
)”变量的基本信息。如下所示,该变量来源于 WDI 数据库(02 World Development Indicators),属于社会发展主题(15 Social Development)。此外,帮助文档中进一步提供了该变量的定义(Source Notes)以及数据来源信息(Source Organization)。
* SL.TLF.ACTI.FE.ZS //15-64岁女性劳动参与率(%)
SL.TLF.ACTI.FE.ZS - Labor force participation rate, female (% of female population ages 15-64) (modeled ILO estimate)
Source 02 World Development Indicators
Topics 15 Social Development
Source Notes Labor force participation rate is the proportion of the population
ages 15-64 that is economically active: all people who supply
labor for the production of goods and services during a specified
period.
Source Organization International Labour Organization, ILOSTAT database. Data retrieved
on June 15, 2021.
其次,使用宏命令生成一个临时文档,载入我们感兴趣的数据。
tempfile tmp //tempfile assigns names to the specified local macro names that may be used as names for temporary files. ///
When the program or do-file concludes, any datasets created with these assigned names are erased.
wbopendata, indicator(SL.TLF.ACTI.FE.ZS) long clear latest //载入数据,latest选项表示最近的年份
/* Metadata for indicator SL.TLF.ACTI.FE.ZS
-----------------------------------------------------------------------------------------------------------------------
Name: Labor force participation rate, female (% of female population ages 15-64) (modeled ILO estimate)
-----------------------------------------------------------------------------------------------------------------------
Collection: 2 World Development Indicators
-----------------------------------------------------------------------------------------------------------------------
Description: Labor force participation rate is the proportion of the population ages 15-64 that is economically
active: all people who supply labor for the production of goods and services during a specified period.
-----------------------------------------------------------------------------------------------------------------------
Note: International Labour Organization, ILOSTAT database. Data retrieved on June 15, 2021.
-----------------------------------------------------------------------------------------------------------------------
Topic(s): 10 Social Protection and Labor ; 15 Social Development
----------------------------------------------------------------------------------------------------------------------- */
describe //查看数据的基本情况
/* Observations: 235
Variables: 12 17 May 2022 15:26
-----------------------------------------------------------------------------------------------------------------------
Variable Storage Display Value
name type format label Variable label
-----------------------------------------------------------------------------------------------------------------------
countrycode str3 %9s Country Code
countryname str56 %56s Country Name
region str3 %9s Region Code
regionname str28 %28s Region Name
adminregion str3 %9s Administrative Region Code
adminregionname str52 %52s Administrative Region Name
incomelevel str3 %9s Income Level Code
incomelevelname str19 %19s Income Level Name
lendingtype str3 %9s Lending Type Code
lendingtypename str14 %14s Lending Type Name
year int %9.0g Year
sl_tlf_acti_f~s float %8.0g SL.TLF.ACTI.FE.ZS
-----------------------------------------------------------------------------------------------------------------------
Sorted by:
Note: Dataset has changed since last saved. */
最后,对数据进行处理,并将其与地图数据合并后绘图。
sort countrycode
save `tmp', replace
sysuse world-c, clear // "world-c "数据集包含全球各国或地区的坐标信息,有“_ID”标识
save world-c.dta, replace //保存到本地,绘图时使用
sysuse world-d, clear // "world-d"数据集包含全球各国或地区的区位信息,有“_ID”标识
merge countrycode using `tmp' // 这里合并的数据是 "world-d"和已经暂存的 “temp”数据集;显然,不用tempfile暂存也是可以的,使用是为了展示更为便捷的方式
sum year
local avg = string(`r(mean)',"%16.1f")
grmap sl_tlf_acti_fe_zs using "world-c.dta", id(_ID) ///
clnumber(20) fcolor(Reds2) ocolor(none ..) ///
title("Labor force participation rate, female (% of female population ages 15-64)", size(*0.8)) ///
legstyle(3) legend(ring(1) position(3)) ///
note("Source: World Development Indicators (latest available year: `avg') using Azevedo, J.P. (2011) wbopendata: Stata module to " "access World Bank databases, Statistical Software Components S457234 Boston College Department of Economics.", size(*.7))
graph export "$figures/global;_labor_participation_1519_women.png", replace
本例中,我们试图了解 2019 年全球各国贫困发生率与人均GDP之间的关系。
两个变量的定义如下:
clear all
wbopendata, indicator(si.pov.dday; ny.gdp.pcap.pp.kd) clear long latest
graph twoway ///
(scatter si_pov_dday ny_gdp_pcap_pp_kd, msize(*.5) mlabel(countrycode) mlabsize(*.6) mlabangle(0)) ///
(lowess si_pov_dday ny_gdp_pcap_pp_kd) , ///
xlabel(, labsize(small)) ylabel(, labsize(small)) ///
xtitle("GDP per capita, PPP (constant 2011 international $)", size(*0.8)) ///
ytitle("Poverty headcount ratio at the International Poverty Line", size(*0.8)) ///
legend(off) graphregion(fcolor(white) lcolor(white)) plotregion(fcolor(white)) ///
note("Source: World Development Indicators (latest available year as off 2012-08-08) using Azevedo, J.P. (2011) wbopendata: Stata module to " "access World Bank databases, Statistical Software Components S457234 Boston College Department of Economics.", ///
size(*.6))
graph export "$figures/scatter_plot.png", replace
clear all
wbopendata, indicator(si.pov.dday; ny.gdp.pcap.pp.kd) clear long latest
graph twoway ///
(scatter si_pov_dday ny_gdp_pcap_pp_kd, msize(*.5)) ///
(scatter si_pov_dday ny_gdp_pcap_pp_kd if regionname == "Aggregates", msize(*.8) mlabel(countryname) mlabsize(*.8) mlabangle(25)) ///
(lowess si_pov_dday ny_gdp_pcap_pp_kd) , ///
xlabel(, labsize(small)) ylabel(, labsize(small)) ///
xtitle("GDP per capita, PPP (constant 2011 international $)", size(*0.8)) ///
ytitle("Poverty headcount ratio at the International Poverty Line", size(*0.8)) ///
legend(off) graphregion(fcolor(white) lcolor(white)) plotregion(fcolor(white)) ///
note("Source: World Development Indicators (latest available year as off 2012-08-08) using Azevedo, J.P. (2011) wbopendata: Stata module to " "access World Bank databases, Statistical Software Components S457234 Boston College Department of Economics.", size(*.6))
graph export "$figures/scatter_plot1.png", replace
这里介绍一个用于描述收入分布的图示方法,即 Pen式分布图(Pen's Parade curve)。该方法由荷兰经济学家 Jan Pen 于 1971 年率先提出。“Parade" 表示 “一系列”,具体指某个指标从低到高排列。 该方法也是分位数函数(Quantile function )的一种形式,常用于比较不同地区和时期的收入不平等状况。该分布所描绘的是每一个分位数上的最大值(max value of each percentile)。
本例中,我们仍使用贫困发生率(Poverty headcount ratio at $1.90 a day (2011 PPP) (% of population))作为分析对象。这里将不同类型的区域作为基本单位,进而计算出某一区域的年度变化 diff_pov
,将其由大到小排列(下图 y 轴),x 轴对应着排序后变量的累积分布百分比。此外,在绘制的分布曲线上,我们还可以看到若干点上的地区标签,这些点在纵轴(y)对应着所示区域的平均值,表示该区域历年来的平均减贫程度;该点所对应的横轴(x)则表示这一平均变化水平在排序所产生的分布中的累积百分比。例如,从左至右的第三个点,其纵轴的大小与撒哈拉以南非洲(Sub-Saharan Africa)的年均减贫程度相等(-.579),该水平对应横轴(x)的约45%处,表示贫困减缓程度不超过该点(-.579)的累积分布概率大约有 45% 。
clear all
wbopendata, indicator(si.pov.dday) clear long
drop if si_pov_dday == .
sort countryname year
bysort countryname : gen diff_pov = (si_pov_dday-si_pov_dday[_n-1])/(year-year[_n-1])
encode region, gen(region1)
encode countryname, gen(region2)
keep if regionname == "Aggregates"
alorenz diff_pov, gp points(30) xdecrease markvar(region2) mlabangle(30) mlabsize(vsmall) /// *gp plot the Pen's Parade curve (max value of each percentile)
ytitle("Change in Poverty (p.p.)", size(*0.8)) xtitle("Proportion of regional episodes of poverty reduction (%)", size(*0.8)) ///
legend(off) title("贫困减缓", size(*0.8)) ///
graphregion(fcolor(white) lcolor(white)) plotregion(fcolor(white)) ///
legend(off) note("Source: World Development Indicators using Azevedo, J.P. (2011) wbopendata: Stata module to " "access World Bank databases, Statistical Software Components S457234 Boston College Department of Economics.", size(*.7))
graph export "$figures/alorenz.png", replace
可持续发展目标(Sustainable Development Goals)是联合国主导的人类发展项目。基于世行数据库,我们可以讨论某些具体目标在特定阶段的完成情况。仍以贫困发生率为研究对象,我们试图讨论1990年以来全球各区域的减贫目标是否实现。具体地,以 2008 年为例,通过计算减贫目标(即相较于基期1990年的贫困发生率,2008年的贫困发生率下降幅度为基期水平的25%),将2008年的目标贫困发生率与2008年的实际贫困发生率联系在一起。下图展示了这种关系。可以发现,位于45度对角线上的点表示未达到减贫目标的地区,下方为达到超额完成减贫目标的地区,位于线上的则为恰好达到目标的地区。
wbopendata, indicator(si.pov.dday) clear long
drop if si_pov_dday == .
keep if regionname == "Aggregates" //这里考察地区层面的目标实现
sort countryname year
gen baseline = si_pov_dday if year == 1990
sort countryname baseline
bysort countryname : replace baseline = baseline[1] if baseline == . //baseline[1]表示第一个值
gen target_08_reduction = baseline/4 //设定目标为:到2008年,贫困发生率的下降幅度为基期(1990年)的 1/4
gen present = si_pov_dday if year == 2008 //2008年贫困发生率的实际值
sort countryname present
bysort countryname : replace present = present[1] if present == .
gen pov_08_terget = (baseline-target_08_reduction) //2008年贫困发生率的目标值
sort countryname year
gen angel45x = .
gen angle45y = .
replace angel45x = 0 in 1
replace angle45y = 0 in 1
replace angel45x = 80 in 2
replace angle45y = 80 in 2
graph twoway ///
(scatter present pov_08_terget if year == 2008, mlabel(countryname) mlabsize(vsmall) mlabangle(-20) msize(*0.5)) ///
(line angle45y angel45x,lwidth(thin) lpattern(dash)), ///
legend(off) xtitle("贫困发生率的目标值(2008年)", size(*0.8)) ytitle("贫困发生率的实际值(2008年)", size(*0.8)) ///
graphregion(fcolor(white) lcolor(white)) plotregion(fcolor(white)) ///
title("减贫成效与减贫目标", size(*0.8)) ///
note("Source: World Development Indicators (latest available year: 2008) using Azevedo, J.P. (2011) wbopendata: Stata module to " "access World Bank databases,Statistical Software Components S457234 Boston College Department of Economics.", size(*.7))
graph export "$figures/mdg_target.png", replace
* 注:本文仅为初稿,后续将不断进行修改和充实。
如有问题、错误或缺漏,请您及时告知,感谢!
* 版本信息:
第1版:2022-05-18
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。