Loading from columnar data
If you prefer to initially manipulate your data before converting into a graph, or want to load from files directly, Raphtory can ingest columnar data and convert it into node and edge updates.
Raphtory's load_edges(), load_nodes(), load_edge_metadata(), and load_node_metadata() functions can ingest data from any columnar source that implements the Arrow C Stream interface. This includes:
- CSV files (direct path)
- Parquet files (direct path)
- Folders containing mixed CSV and Parquet files
- Pandas DataFrames
- Polars DataFrames
- DuckDB query results
- Any other Arrow-compatible data source
Schema handling: Raphtory can automatically infer types from typed sources (Parquet, Pandas, DuckDB). For untyped sources like CSV, you can either let Raphtory interpret the data or specify an explicit schema using the schema parameter with PropType values. This is also useful when adding data to a graph that already has properties with established types – the schema ensures new data is cast to match.
Loading from CSV files
The simplest approach is to pass a CSV file path directly to load_edges() or load_nodes(). No external libraries needed. You can also pass a folder path containing multiple CSV files (or a mix of CSV and Parquet files) – Raphtory will load all files in the folder.
In this example we're ingesting network traffic data which includes different types of interactions between servers. For CSV files, we can specify an explicit schema to ensure numeric columns like data_size_MB are parsed as floats rather than strings. You can also use csv_options to customize parsing (e.g., delimiter, quote character, escape character):
Loading from Pandas DataFrames
If you prefer to manipulate your data in Pandas first – for example to transform timestamps or filter rows – you can pass the DataFrame directly. Types are inferred from pandas dtypes:
Loading from DuckDB
For larger datasets or SQL-based transformations, DuckDB integrates seamlessly. DuckDB query results can be passed directly to Raphtory:
Loading from Parquet files
Apache Parquet files can be loaded directly by path. Parquet files include embedded type information, so Raphtory automatically uses the correct types:
Adding metadata separately
In some instances you may want to break the ingestion into multiple stages, adding metadata to existing nodes/edges in a separate step. This is common when you have metadata in a different data source than your main graph data.
Use load_edge_metadata() and load_node_metadata() for this:
Metadata can only be added to nodes and edges which already exist in the graph. If you attempt to add metadata to non-existent entities, Raphtory will throw an error.
Function parameters
These functions have optional arguments to cover everything we have seen in the prior direct updates example.
Use layer or node_type when all rows in your data share the same value. Use layer_col or node_type_col when the values vary per row in your data.
load_edges
| Parameter | Description |
|---|---|
data | File path, DataFrame, or Arrow-compatible source |
src | Source node column name |
dst | Destination node column name |
time | Timestamp column name |
properties | List of temporal property column names (values that change over time) |
metadata | List of constant property column names (values that don't change) |
shared_metadata | Dictionary of metadata applied to all edges |
layer | Explicit layer name for all edges |
layer_col | Column name to read layer from (cannot be used with layer) |
schema | Type mappings using PropType |
csv_options | CSV parsing options (delimiter, quote, escape, etc.) |
load_nodes
| Parameter | Description |
|---|---|
data | File path, DataFrame, or Arrow-compatible source |
id | Node ID column name |
time | Timestamp column name |
properties | List of temporal property column names |
metadata | List of constant property column names |
shared_metadata | Dictionary of metadata applied to all nodes |
node_type | Explicit node type for all nodes |
node_type_col | Column name to read node type from (cannot be used with node_type) |
schema | Type mappings using PropType |
csv_options | CSV parsing options |
load_edge_metadata
| Parameter | Description |
|---|---|
data | File path, DataFrame, or Arrow-compatible source |
src | Source node column name |
dst | Destination node column name |
metadata | List of metadata column names |
shared_metadata | Dictionary of metadata applied to all edges |
layer | Explicit layer name for all edges |
layer_col | Column name to read layer from (cannot be used with layer) |
schema | Type mappings using PropType |
csv_options | CSV parsing options |
load_node_metadata
| Parameter | Description |
|---|---|
data | File path, DataFrame, or Arrow-compatible source |
id | Node ID column name |
metadata | List of metadata column names |
shared_metadata | Dictionary of metadata applied to all nodes |
node_type | Explicit node type for all nodes |
node_type_col | Column name to read node type from (cannot be used with node_type) |
schema | Type mappings using PropType |
csv_options | CSV parsing options |