DuckDBDataConnector¶
This document explains the features of the DuckDBDataConnector class, which retrieves data from a DuckDB table or custom SQL provided and saves data to the specified table in the specified instance. This class extends from the BaseDataConnector class.
Configuration¶
Required Configuration¶
duckdb_path: The filepath to the duckdb instance.
Optional Configuration¶
There is no optional configuration.
Default Configuration¶
There is no default configuration.
Methods¶
get_data¶
The get_data() method retrieves data from the DuckDBDataConnector table or custom SQL provided and returns a Pandas dataframe.
def get_data(self, table, sql=None, *args, **kwargs):
Arguments:
table(str): Name of the table to retrieve data from.sql(str): The optional SQL query to execute. Default value is None.
Returns
data(dataframe): A Pandas dataframe object containing the data.
Example Usage
from lolpop.component import DuckDBDataConnector
config = {
#insert component config here
}
# create an instance of DuckDBDataConnector with default arguments
duck_conn = DuckDBDataConnector(conf=config)
# retrieve data from the database using an SQL select statement
df = duck_conn.get_data('your_table_name')
This example creates an instance of the DuckDBDataConnector class and uses the get_data() method to retrieve data from the database.
save_data¶
The save_data method saves data to the specified table in the specified DuckDBDataConnector instance. If the table does not exist, it gets created with the data structure from the dataframe provided. If a column is missing, it adds the column as nulls. This preserves the structure of the destination table.
def save_data(self, data, table, *args, **kwargs):
Arguments
data(pandas.DataFrame): Pandas dataframe containing the data to be saved.table(str): Name of the table to save the data to.
Example Usage
from lolpop.component import DuckDBDataConnector
config = {
#insert component config here
}
# create an instance of DuckDBDataConnector with default arguments
duck_conn = DuckDBDataConnector(conf = config)
# create a Pandas dataframe object containing data
confluence_table = pd.DataFrame({
'page': ['A', 'B', 'C'],
'editorial': ['Ace', 'Bob', 'Chloe'],
'when': ['2019-07-01', '2019-08-05', '2019-09-10']
})
# create table if it doesn't exist and save data to it
duck_conn.save_data(confluence_table, 'your_table_name')
This example creates an instance of the DuckDBDataConnector class, creates a Pandas dataframe and saves the content of the dataframe as a table in the database using the save_data() method.
_load_data¶
This method executes the given SQL command in DuckDB and returns the retrieved data.
_load_data(self, sql, path, *args, **kwargs)
sql: (str) SQL command to execute.path: (str) Path to DuckDB instance.
Returns
pandas.DataFrame: Fetched data from DuckDB.
_save_data¶
_save_data(self, data, table_name, path, *args, **kwargs)
This method saves the given data to a table in DuckDB .
Arguments:
data: (pandas.DataFrame) The data to savetable_name: (str) Name of table to save datapath: (str) Path to duckdb instance .
__map_pandas_col_type_to_duckdb_type¶
This is a private method which maps pandas data types to duckdb data types.
def __map_pandas_col_type_to_duckdb_type(self, col_type):
Arguments:
- col_type (type): Pandas data type.
Returns
- column_type (type): DuckDB data type corresponding to pandas data type.