# Comparing Employment Data: Revelio Labs vs BLS

This notebook allows users to explore and compare employment data from two sources: **Revelio Labs** and the **U.S. Bureau of Labor Statistics (BLS)**.  

It provides tools to analyze both:

1. **Revisions in employment estimates** – preliminary employment figures updated in subsequent releases. Understanding these revisions is important for evaluating the accuracy and reliability of employment measures over time.  
2. **Levels of seasonally adjusted employment** – compare the actual reported employment levels between Revelio Labs and the BLS.

In this notebook, you can:  
- Examine month-over-month changes in employment for different data releases.  
- Compute and visualize revision rates (differences between releases) for both sources.  
- Compare patterns and magnitudes of revisions between Revelio Labs and the BLS.  
- Compare and visualize the level of seasonally adjusted employment from both sources.

This tool is intended for economists, analysts, and researchers interested in labor market dynamics and data quality.


## Data Requirements

To use this notebook, please download the following files and place them in the working directory:

1. **`employment_national.csv`**  
   Contains the most recent estimates of employment from Revelio Labs starting from **January 2021**.

2. **`employment_national_history.csv`**  
   Contains historical estimates of employment from Revelio Labs starting from **January 2021**.

3. **`bls_revisions.xlsx`**  
   Contains historical revisions in **seasonally adjusted (SA) employment** from the BLS Current Employment Statistics (CES).  
   You can download it from: [BLS CES Revisions Data](https://www.bls.gov/web/empsit/cesnaicsrev.htm)

> **Note:** This notebook also uses the [FRED API](https://fred.stlouisfed.org/) to fetch certain economic series.  
> You will need a valid FRED API key to run those parts of the notebook.

Make sure files are available before running the analysis.

## Library Requirements

This notebook relies on the following Python libraries. Make sure they are installed in your environment before running the notebook:

* `numpy`
* `os` (standard library)
* `pandas`
* `dateutil.relativedelta`
* `plotly.graph_objects`
* `plotly.subplots`
* `matplotlib.pyplot`
* `fredapi` (for accessing the FRED API)

In [1]:
import numpy as np
import os
import pandas as pd
from dateutil.relativedelta import relativedelta
import plotly.graph_objects as go
import matplotlib.pyplot as plt
from plotly.subplots import make_subplots
from fredapi import Fred

# Reveli Labs colors
crs = [
'#48AFF4','#5DDA93',
'#B582D9','#F9B26C',
'#6797E0','#E76A87',
'#1CCEC5','#F4D35D',
'#F391D4','#9CA4B4']

In [2]:
# Include the name of your working directory
workspace = 'toexport'

In [3]:
# Change the end month below to the most recent month

end_month = '2025-08'

# (1) Levels of Employment

Here we plot the levels of seasonally adjusted employment estimated by Revelio Labs to that estimated by the Bureau of Labor Statistics from the Currentl Employment Statistics

* Revelio Labs' CSV file includes both the seasonally adjusted and the non seasonally adjusted levels of employment
* To obtain level of employment from the BLS, use FRED API
* Make sure to insert your FRED API key below

Retrieve BLS employment data from FRED

In [4]:
fred = Fred(api_key='<YOUR FRED KEY>')

In [6]:
bls = pd.DataFrame({series:fred.get_series(series) for series in ['PAYEMS']})
bls = bls.reset_index()
bls.rename(columns={'index': 'date', 'PAYEMS': 'bls_employment_sa'}, inplace=True) 
bls['date'] = pd.to_datetime(bls['date'], format='%Y/%m/%d')

bls = bls[bls['date'] >= '2021-01-01']        
bls = bls.reset_index(drop=True)
bls.tail()

Unnamed: 0,date,bls_employment_sa
50,2025-03-01,159275.0
51,2025-04-01,159433.0
52,2025-05-01,159452.0
53,2025-06-01,159466.0
54,2025-07-01,159539.0


Retrieve RL data from **`employment_national.csv`**  

The file includes the following columns:
* **month**: month of employment observations ranign from January 2021 till the most recent month
* **employment_nsa**: Total employment in 1000s non seasonally adjusted
* **employment_sa**: Total employment in 1000s seasonally adjusted

In [9]:
rl = pd.read_csv(os.path.join(workspace, 'employment_national.csv'), index_col=0).reset_index(drop=True)

In [11]:
# Plot employment over time

fig = go.Figure()

# BLS SA 
fig.add_trace(
    go.Scatter(
        x=bls["date"], y=bls["bls_employment_sa"],
        mode="lines",
        marker_color=crs[3],
        name="BLS"
    )
)


# RL SA
fig.add_trace(
    go.Scatter(
        x=rl["month"], y=rl["employment_sa"],
        mode="lines",
        marker_color=crs[0],
        name="Revelio Labs"
    )
)



fig.update_layout(
    yaxis = dict(zeroline = False, 
                 #showgrid = False,
                 tickformat = ',.0f',
                 title = 'Thousand of persons',
                 gridcolor = '#EAECF0', gridwidth = 1,
                 tickfont = dict(family = "Source Sans 3 Regular", size = 15, color = "#2D426A")),
    xaxis = dict(zeroline = True, 
                 zerolinecolor = '#EAECF0', 
                 zerolinewidth = 1,
                 #showgrid = False,
                 #tickformat = '.1%',
                 range = ['2021-01', end_month],
                 gridcolor = '#EAECF0', 
                 gridwidth = 1, 
                 showticklabels = True,
                 tickfont = dict(family = "Source Sans 3 Regular", size = 15, color = "#2D426A")),
    showlegend = True,
    legend = dict(yanchor = 'bottom',
                  xanchor = 'center', y = -0.2,
                  x = 0.5,  
                  font = dict(family = "Source Sans 3 Regular", size = 16, color = "#2D426A"),
                  orientation = 'h',
                  traceorder = 'normal'),
   title = dict(text = f'Total non-farm employment (Thousands of persons)', 
                 font_size=28,
                 yanchor = 'top', 
                 y = 0.92, 
                 xanchor = 'left', 
                 x = 0.02),
    plot_bgcolor = 'rgba(0,0,0,0)',
    height = 600,
    #width = 768,
    hoverlabel = dict(bgcolor = "white",
                      bordercolor = "#2D426A",
                      font = dict(size = 12, family = 'Source Sans 3 Regular', color = "#2D426A")
    )
)

fig.update_layout(margin=dict(t=125, r = 185))

fig.show()


# (2) Changes in Employment

Here we plot the month-over-month changes in the levels of seasonally adjusted total non-farm employment estimated by Revelio Labs to that estimated by the Bureau of Labor Statistics from the Currentl Employment Statistics

* We use the data frames obtained in (1)

In [12]:
bls['employment_change'] = bls['bls_employment_sa'].diff()
rl["employment_change"] = rl["employment_sa"].diff()

In [14]:
# Plot employment over time

fig = go.Figure()

# BLS SA 
fig.add_trace(
    go.Scatter(
        x=bls["date"], y=bls["employment_change"],
        mode="lines",
        marker_color=crs[3],
        name="BLS"
    )
)


# RL SA
fig.add_trace(
    go.Scatter(
        x=rl["month"], y=rl["employment_change"],
        mode="lines",
        marker_color=crs[0],
        name="Revelio Labs"
    )
)



fig.update_layout(
    yaxis = dict(zeroline = False, 
                 #showgrid = False,
                 tickformat = ',.0f',
                 title = 'Thousand of persons',
                 gridcolor = '#EAECF0', gridwidth = 1,
                 tickfont = dict(family = "Source Sans 3 Regular", size = 15, color = "#2D426A")),
    xaxis = dict(zeroline = True, 
                 zerolinecolor = '#EAECF0', 
                 zerolinewidth = 1,
                 #showgrid = False,
                 #tickformat = '.1%',
                 range = ['2021-01', end_month],
                 gridcolor = '#EAECF0', 
                 gridwidth = 1, 
                 showticklabels = True,
                 tickfont = dict(family = "Source Sans 3 Regular", size = 15, color = "#2D426A")),
    showlegend = True,
    legend = dict(yanchor = 'bottom',
                  xanchor = 'center', y = -0.2,
                  x = 0.5,  
                  font = dict(family = "Source Sans 3 Regular", size = 16, color = "#2D426A"),
                  orientation = 'h',
                  traceorder = 'normal'),
   title = dict(text = f'Monthly change in total non-farm employment<br>(Thousands of Persons)<br>', 
                 font_size=28,
                 yanchor = 'top', 
                 y = 0.92, 
                 xanchor = 'left', 
                 x = 0.02),
    plot_bgcolor = 'rgba(0,0,0,0)',
    height = 600,
    #width = 768,
    hoverlabel = dict(bgcolor = "white",
                      bordercolor = "#2D426A",
                      font = dict(size = 12, family = 'Source Sans 3 Regular', color = "#2D426A")
    )
)

fig.update_layout(margin=dict(t=125, r = 185))

fig.show()


# (3) Revision

We compare the revision rates in the change in total non-farm employment across different data releases between Revelio Labs and BLS


### Revelio Labs Revisions
These can be retieved from **`employment_national_history.csv`**  
which contains historical estimates of employment from Revelio Labs starting from **January 2021**.

The file includes the following columns:
* **Month**: month of employment observations ranign from January 2021 till the most recent month
* **Version**: The month in which the data was generated (release date). For example, Version = '2025-06' represents the data relase in June 2025 of all the previous month. That would be the first release for June 2025, the second release for May 2025 and the third release for April 2025
* **Employment (NSA)**: Total employment in 1000s non seasonally adjusted
* **Employment (SA)**: Total employment in 1000s seasonally adjusted

In [15]:
df_rl = pd.read_csv(os.path.join(workspace, 'employment_national_history.csv'), index_col=0).reset_index(drop=True)
df_rl

Unnamed: 0,month,version,employment_nsa,employment_sa
0,2021-01,2021-08,146178.7,146382.1
1,2021-02,2021-08,146308.8,146691.6
2,2021-03,2021-08,146728.1,147094.0
3,2021-04,2021-08,146955.2,147376.5
4,2021-05,2021-08,147657.8,147693.8
...,...,...,...,...
1507,2025-03,2025-07,156994.1,157192.6
1508,2025-04,2025-07,156969.5,157279.9
1509,2025-05,2025-07,157286.1,157331.7
1510,2025-06,2025-07,157444.9,157434.2


### Compute month-over-month change in employment

Compute the month-over-month changes in seasonally adjusted employment `Employment (SA)` which we later compare across different releases for each month and compare to the BLS revisions

In [16]:
vars_to_diff = ['employment_sa']

df_rl = df_rl.sort_values(by=['version', 'month'])

for var in vars_to_diff:
    df_rl[f'{var}_change'] = df_rl.sort_values('month').groupby('version')[var].diff()

df_rl

Unnamed: 0,month,version,employment_nsa,employment_sa,employment_sa_change
0,2021-01,2021-08,146178.7,146382.1,
1,2021-02,2021-08,146308.8,146691.6,309.5
2,2021-03,2021-08,146728.1,147094.0,402.4
3,2021-04,2021-08,146955.2,147376.5,282.5
4,2021-05,2021-08,147657.8,147693.8,317.3
...,...,...,...,...,...
1507,2025-03,2025-07,156994.1,157192.6,113.1
1508,2025-04,2025-07,156969.5,157279.9,87.3
1509,2025-05,2025-07,157286.1,157331.7,51.8
1510,2025-06,2025-07,157444.9,157434.2,102.5


### Examine revisions in Seasonally Adjusted Employment - Employment (SA)

We calculate how the change in seasonally adjusted employment changes across the first 3 releases for each month

Example:
* The data release in April 2025 (Version = '2025-04') is the first releas for the month April, Second release for the month of March and third relsease for the month of February
* So to get the difference between the first and second releases for March, we calculate the difference between the change in seasonally adjusted employment relases in version = '2025-03' and '2025-04'
* From this relsease, we can also obtain the difference between the first and third releases for February. We calculate the difference between the change in seasonally adjusted employment relases in version = '2025-02' and '2025-04'
* Below we define relative months, and compute the revisions in the change in employment

In [18]:
var = 'employment_sa_change'

In [19]:
# create revisions dataframe
revisions = pd.DataFrame(columns=['first','second','third'])
# revisions.index = pd.to_datetime(revisions.index)
for b in df_rl['version'].unique():
    b_dt = pd.to_datetime(str(b)[:4]+'-'+str(b)[-2:])
    m1 = b_dt - relativedelta(months=0)
    m2 = b_dt - relativedelta(months=1)
    m3 = b_dt - relativedelta(months=2)
    revisions.loc[m1,'first'] = df_rl.loc[(df_rl['version']==b)&(pd.to_datetime(df_rl['month'])==m1),
                                       var].sum()
    revisions.loc[m2,'second'] = df_rl.loc[(df_rl['version']==b)&(pd.to_datetime(df_rl['month'])==m2),
                                       var].sum()
    revisions.loc[m3,'third'] = df_rl.loc[(df_rl['version']==b)&(pd.to_datetime(df_rl['month'])==m3),
                                       var].sum()
rl_revisions = revisions.sort_index()
rl_revisions = rl_revisions.reset_index().rename(columns={'index': 'month'})
rl_revisions['rev_2_minus_1'] = (rl_revisions['second']-rl_revisions['first'])
rl_revisions['rev_3_minus_1'] = (rl_revisions['third']-rl_revisions['first'])

### BLS revisions

* Download xlsx file from  OUR WEBPAGE
* Source: CES revisions `https://www.bls.gov/web/empsit/cesnaicsrev.htm`
* The downloaded data is the difference in the change in employment between different releases

In [20]:
bls_revisions = pd.read_excel(os.path.join(workspace, 'bls_revisions.xlsx'), index_col=0).reset_index(drop=True)
bls_revisions = bls_revisions[bls_revisions['month']>= '2021-08-01']

### Plot Revisions to compare

In [22]:
bls_revisions["month"] = pd.to_datetime(bls_revisions["month"])
rl_revisions["month"] = pd.to_datetime(rl_revisions["month"])
bls_revisions = bls_revisions.sort_values("month")
rl_revisions = rl_revisions.sort_values("month")

fig = make_subplots(
    rows=1, cols=2,
    subplot_titles=("Difference between first &<br>second releases", 
                    "Difference between first &<br>third releases"),
    shared_yaxes=True,   # share y-axis
)


# --- Left subplot: revisions 1→2 
fig.add_trace(
    go.Bar(
        x=bls_revisions["month"], 
        y=bls_revisions["rev_2_minus_1"],
        name="BLS", 
        marker_color=crs[3],
        hovertemplate="In %{x}, the difference in monthly employment changes between the second and first releases in BLS data was %{y} thousand jobs.<extra></extra>"
    ),
    row=1, col=1
)
fig.add_trace(
    go.Bar(
        x=rl_revisions["month"], 
        y=rl_revisions["rev_2_minus_1"],
        name="Revelio Labs", 
        marker_color=crs[0],
        hovertemplate="In %{x}, the difference in monthly employment changes between the second and first releases in Revelio Labs data was %{y} thousand jobs.<extra></extra>"
    ),
    row=1, col=1
)

# --- Right subplot: revisions 1→3 ---
fig.add_trace(
    go.Bar(
        x=bls_revisions["month"], 
        y=bls_revisions["rev_2_minus_1"],
        name="BLS", 
        marker_color=crs[3],
    #    width=2_000_000_000,  # ~23 days in ms → covers most of a month
        hovertemplate="In %{x}, the difference in monthly employment changes between the second and first releases in BLS data was %{y} thousand jobs.<extra></extra>"
    ),
    row=1, col=2
)
fig.add_trace(
    go.Bar(
        x=rl_revisions["month"], 
        y=rl_revisions["rev_2_minus_1"],
        name="Revelio Labs", 
   #     width=2_000_000_000,  # ~23 days in ms → covers most of a month
        marker_color=crs[0],
        hovertemplate="In %{x}, the difference in monthly employment changes between the second and first releases in Revelio Labs data was %{y} thousand jobs.<extra></extra>"
    ),
    row=1, col=2
)

fig.update_layout(
    yaxis1 = dict(zeroline = False, 
                 #showgrid = False,
                 tickformat = '.0f',
                 #range = [-0.5, 0.5],
                 gridcolor = 'white', gridwidth = 1,
                 tickfont = dict(family = "Source Sans 3 Regular", size = 18, color = "#2D426A")),
    yaxis2 = dict(zeroline = False, 
                 #showgrid = False,
                 tickformat = '.1f',
                 #range = [-0.5, 0.5],
                 gridcolor = 'white', gridwidth = 1,
                 tickfont = dict(family = "Source Sans 3 Regular", size = 18, color = "#2D426A")),
    xaxis1 = dict(zeroline = True, 
                 zerolinecolor = 'white',
                  range = ['2021-07', end_month],
                 zerolinewidth = 1,
                 #showgrid = False,
                 #tickformat = '.1%',
                 gridcolor = 'white', 
                 gridwidth = 1, 
                 showticklabels = True,
                 tickfont = dict(family = "Source Sans 3 Regular", size = 15, color = "#2D426A")),
    xaxis2 = dict(zeroline = True, 
                 zerolinecolor = 'white',
                  range = ['2021-07', end_month],
                 zerolinewidth = 1,
                 #showgrid = False,
                 #tickformat = '.1%',
                 gridcolor = 'white', 
                 gridwidth = 1, 
                 showticklabels = True,
                 tickfont = dict(family = "Source Sans 3 Regular", size = 15, color = "#2D426A")),    showlegend = False,
    legend = dict(yanchor = 'bottom',
                  xanchor = 'center', y = -0.1,
                  x = 0.5,  
                  font = dict(family = "Source Sans 3 Regular", size = 16, color = "#2D426A"),
                  orientation = 'h',
                  traceorder = 'normal'),
   title = dict(text = f'<span style="color:{crs[0]};">Revelio Labs</span> revisions compared to <span style="color:{crs[3]};">BLS</span><br>revisions', 
                yanchor='top', y=0.92, xanchor='left', x=0.02),    
    plot_bgcolor = 'rgba(0,0,0,0)',
    height = 600,
    #width = 768,
    hoverlabel = dict(bgcolor = "white",
                      bordercolor = "#2D426A",
                      font = dict(size = 12, family = 'Source Sans 3 Regular', color = "#2D426A")
    )
)

fig.update_layout(
    bargap=0.1,        # space between groups (increase/decrease this)
    bargroupgap=0.0    # no gap within the same group
)

fig.update_layout(margin=dict(t=120, r = 185))


fig.show()
