Parsing TensorBoard data locally

Dawid Laszuk published on
2 min, 279 words

TensorBoard seems to be a de facto standard tool for displaying your ML experimental data. There are some alternatives, but likely due to "being first" and "by google" stamps the TensorBoard (TB) is still the most common one. Many of these alternatives are far superior, especially if you're using something else than Tensorflow, but TB is still the first go to.

A good thing with TB is that it stores data locally. A bad thing about TB is that it doesn't provide a nice way to extract that data. (No, clicking one graph at a time and then doing "save as csv" is a terrible way when there are hundreds of graphs to be compared.) To be fair, there is "a way" to convert experimental data into, for example, pandas DataFrames but since it requires uploading and downloading data from someone's server I'm going to politely ignore the audacity. Ok, it might be the best solution if you have super fast internet connect, a SSD disk, and no RAM memory left on your computer, but otherwise it's a bazooka shooting a fly.

Do it locally. Tensorboard stores your data in Tensorflow TFEvents records. Tensorflow already has API, summary_iterator, to deal with them. It isn't magic. Actually, here is a link to my Python gist that does just that - converts TensorBoard data to pandas DataFrame. Note that you need Tensorflow and Pandas installed to use it, which I assume you have since you're storing as TFevents and want to convert to pandas DataFrame.

Here it is embedded:

def convert_tb_data(root_dir, sort_by=None):
"""Convert local TensorBoard data into Pandas DataFrame.
Function takes the root directory path and recursively parses
all events data.
If the `sort_by` value is provided then it will use that column
to sort values; typically `wall_time` or `step`.
*Note* that the whole data is converted into a DataFrame.
Depending on the data size this might take a while. If it takes
too long then narrow it to some sub-directories.
Paramters:
root_dir: (str) path to root dir with tensorboard data.
sort_by: (optional str) column name to sort by.
Returns:
pandas.DataFrame with [wall_time, name, step, value] columns.
"""
import os
import pandas as pd
from tensorflow.python.summary.summary_iterator import summary_iterator
def convert_tfevent(filepath):
return pd.DataFrame([
parse_tfevent(e) for e in summary_iterator(filepath) if len(e.summary.value)
])
def parse_tfevent(tfevent):
return dict(
wall_time=tfevent.wall_time,
name=tfevent.summary.value[0].tag,
step=tfevent.step,
value=float(tfevent.summary.value[0].simple_value),
)
columns_order = ['wall_time', 'name', 'step', 'value']
out = []
for (root, _, filenames) in os.walk(root_dir):
for filename in filenames:
if "events.out.tfevents" not in filename:
continue
file_full_path = os.path.join(root, filename)
out.append(convert_tfevent(file_full_path))
# Concatenate (and sort) all partial individual dataframes
all_df = pd.concat(out)[columns_order]
if sort_by is not None:
all_df = all_df.sort_values(sort_by)
return all_df.reset_index(drop=True)
if __name__ == "__main__":
dir_path = "/home/kretyn/projects/ai-traineree/runs/"
exp_name = "CartPole-v1_2021-01-26_11:02"
df = convert_tb_data(f"{dir_path}/{exp_name}")
print(df.head())
view raw tb2df.py hosted with ❤ by GitHub