Loading from columnar data

If you prefer to initially manipulate your data before converting into a graph, or want to load from files directly, Raphtory can ingest columnar data and convert it into node and edge updates.

Raphtory's load_edges(), load_nodes(), load_edge_metadata(), and load_node_metadata() functions can ingest data from any columnar source that implements the Arrow C Stream interface. This includes:

CSV files (direct path)
Parquet files (direct path)
Folders containing mixed CSV and Parquet files
Pandas DataFrames
Polars DataFrames
DuckDB query results
Any other Arrow-compatible data source

Schema handling: Raphtory can automatically infer types from typed sources (Parquet, Pandas, DuckDB). For untyped sources like CSV, you can either let Raphtory interpret the data or specify an explicit schema using the schema parameter with PropType values. This is also useful when adding data to a graph that already has properties with established types – the schema ensures new data is cast to match.

Loading from CSV files

The simplest approach is to pass a CSV file path directly to load_edges() or load_nodes(). No external libraries needed. You can also pass a folder path containing multiple CSV files (or a mix of CSV and Parquet files) – Raphtory will load all files in the folder.

In this example we're ingesting network traffic data which includes different types of interactions between servers. For CSV files, we can specify an explicit schema to ensure numeric columns like data_size_MB are parsed as floats rather than strings. You can also use csv_options to customize parsing (e.g., delimiter, quote character, escape character):

Loading from Pandas DataFrames

If you prefer to manipulate your data in Pandas first – for example to transform timestamps or filter rows – you can pass the DataFrame directly. Types are inferred from pandas dtypes:

Loading from DuckDB

For larger datasets or SQL-based transformations, DuckDB integrates seamlessly. DuckDB query results can be passed directly to Raphtory:

Loading from Parquet files

Apache Parquet files can be loaded directly by path. Parquet files include embedded type information, so Raphtory automatically uses the correct types:

Adding metadata separately

In some instances you may want to break the ingestion into multiple stages, adding metadata to existing nodes/edges in a separate step. This is common when you have metadata in a different data source than your main graph data.

Use load_edge_metadata() and load_node_metadata() for this:

Metadata can only be added to nodes and edges which already exist in the graph. If you attempt to add metadata to non-existent entities, Raphtory will throw an error.

Function parameters

These functions have optional arguments to cover everything we have seen in the prior direct updates example.

Use layer or node_type when all rows in your data share the same value. Use layer_col or node_type_col when the values vary per row in your data.

load_edges

Parameter	Description
`data`	File path, DataFrame, or Arrow-compatible source
`src`	Source node column name
`dst`	Destination node column name
`time`	Timestamp column name
`properties`	List of temporal property column names (values that change over time)
`metadata`	List of constant property column names (values that don't change)
`shared_metadata`	Dictionary of metadata applied to all edges
`layer`	Explicit layer name for all edges
`layer_col`	Column name to read layer from (cannot be used with `layer`)
`schema`	Type mappings using `PropType`
`csv_options`	CSV parsing options (delimiter, quote, escape, etc.)

load_nodes

Parameter	Description
`data`	File path, DataFrame, or Arrow-compatible source
`id`	Node ID column name
`time`	Timestamp column name
`properties`	List of temporal property column names
`metadata`	List of constant property column names
`shared_metadata`	Dictionary of metadata applied to all nodes
`node_type`	Explicit node type for all nodes
`node_type_col`	Column name to read node type from (cannot be used with `node_type`)
`schema`	Type mappings using `PropType`
`csv_options`	CSV parsing options

load_edge_metadata

Parameter	Description
`data`	File path, DataFrame, or Arrow-compatible source
`src`	Source node column name
`dst`	Destination node column name
`metadata`	List of metadata column names
`shared_metadata`	Dictionary of metadata applied to all edges
`layer`	Explicit layer name for all edges
`layer_col`	Column name to read layer from (cannot be used with `layer`)
`schema`	Type mappings using `PropType`
`csv_options`	CSV parsing options

load_node_metadata

Parameter	Description
`data`	File path, DataFrame, or Arrow-compatible source
`id`	Node ID column name
`metadata`	List of metadata column names
`shared_metadata`	Dictionary of metadata applied to all nodes
`node_type`	Explicit node type for all nodes
`node_type_col`	Column name to read node type from (cannot be used with `node_type`)
`schema`	Type mappings using `PropType`
`csv_options`	CSV parsing options