Loading from columnar data

If you prefer to initially manipulate your data before converting into a graph, or want to load from files directly, Raphtory can ingest columnar data and convert it into node and edge updates.

Raphtory's load_edges(), load_nodes(), load_edge_metadata(), and load_node_metadata() functions can ingest data from any columnar source that implements the Arrow C Stream interface. This includes:

  • CSV files (direct path)
  • Parquet files (direct path)
  • Folders containing mixed CSV and Parquet files
  • Pandas DataFrames
  • Polars DataFrames
  • DuckDB query results
  • Any other Arrow-compatible data source

Schema handling: Raphtory can automatically infer types from typed sources (Parquet, Pandas, DuckDB). For untyped sources like CSV, you can either let Raphtory interpret the data or specify an explicit schema using the schema parameter with PropType values. This is also useful when adding data to a graph that already has properties with established types – the schema ensures new data is cast to match.

Loading from CSV files

The simplest approach is to pass a CSV file path directly to load_edges() or load_nodes(). No external libraries needed. You can also pass a folder path containing multiple CSV files (or a mix of CSV and Parquet files) – Raphtory will load all files in the folder.

In this example we're ingesting network traffic data which includes different types of interactions between servers. For CSV files, we can specify an explicit schema to ensure numeric columns like data_size_MB are parsed as floats rather than strings. You can also use csv_options to customize parsing (e.g., delimiter, quote character, escape character):

Loading from Pandas DataFrames

If you prefer to manipulate your data in Pandas first – for example to transform timestamps or filter rows – you can pass the DataFrame directly. Types are inferred from pandas dtypes:

Loading from DuckDB

For larger datasets or SQL-based transformations, DuckDB integrates seamlessly. DuckDB query results can be passed directly to Raphtory:

Loading from Parquet files

Apache Parquet files can be loaded directly by path. Parquet files include embedded type information, so Raphtory automatically uses the correct types:

Adding metadata separately

In some instances you may want to break the ingestion into multiple stages, adding metadata to existing nodes/edges in a separate step. This is common when you have metadata in a different data source than your main graph data.

Use load_edge_metadata() and load_node_metadata() for this:

Metadata can only be added to nodes and edges which already exist in the graph. If you attempt to add metadata to non-existent entities, Raphtory will throw an error.

Function parameters

These functions have optional arguments to cover everything we have seen in the prior direct updates example.

Use layer or node_type when all rows in your data share the same value. Use layer_col or node_type_col when the values vary per row in your data.

load_edges

ParameterDescription
dataFile path, DataFrame, or Arrow-compatible source
srcSource node column name
dstDestination node column name
timeTimestamp column name
propertiesList of temporal property column names (values that change over time)
metadataList of constant property column names (values that don't change)
shared_metadataDictionary of metadata applied to all edges
layerExplicit layer name for all edges
layer_colColumn name to read layer from (cannot be used with layer)
schemaType mappings using PropType
csv_optionsCSV parsing options (delimiter, quote, escape, etc.)

load_nodes

ParameterDescription
dataFile path, DataFrame, or Arrow-compatible source
idNode ID column name
timeTimestamp column name
propertiesList of temporal property column names
metadataList of constant property column names
shared_metadataDictionary of metadata applied to all nodes
node_typeExplicit node type for all nodes
node_type_colColumn name to read node type from (cannot be used with node_type)
schemaType mappings using PropType
csv_optionsCSV parsing options

load_edge_metadata

ParameterDescription
dataFile path, DataFrame, or Arrow-compatible source
srcSource node column name
dstDestination node column name
metadataList of metadata column names
shared_metadataDictionary of metadata applied to all edges
layerExplicit layer name for all edges
layer_colColumn name to read layer from (cannot be used with layer)
schemaType mappings using PropType
csv_optionsCSV parsing options

load_node_metadata

ParameterDescription
dataFile path, DataFrame, or Arrow-compatible source
idNode ID column name
metadataList of metadata column names
shared_metadataDictionary of metadata applied to all nodes
node_typeExplicit node type for all nodes
node_type_colColumn name to read node type from (cannot be used with node_type)
schemaType mappings using PropType
csv_optionsCSV parsing options