Skip to content

S3DataConnector

This class defines a data connector that reads from and writes to an AWS S3 bucket using the boto3 package. This class is a child of the BaseDataConnector class and inherits methods from this class.

Configuration

Required Configuration

  • AWS_S3_BUCKET: S3 bucket name to read/write data to.
  • AWS_ACCESS_KEY_ID: AWS access key id.
  • AWS_SECRET_KEY_ID: AWS secret key id.
  • AWS_SESSION_TOKEN: AWS session token.

Optional Configuration

There is no optional configuration.

Default Configuration

There is no default configuration.

Methods

get_data

This method reads data from an AWS S3 bucket and returns a Pandas DataFrame.

def get_data(self, path, *args, **kwargs)

Arguments

  • path: The location of the file in the bucket.

Returns - A Pandas DataFrame containing the data from the bucket.

save_data

This method saves data into an AWS S3 bucket.

def save_data(self, data, path, *args, **kwargs)

Arguments

  • data: The data to be written into the bucket.
  • path: The location of the file in the bucket.

_load_data

This method reads data from files within the AWS S3 bucket and returns a Pandas DataFrame.

def _load_data(self, path, config, **kwargs)

Arguments

  • path: The location of the file in the bucket.
  • config: A dictionary that stores AWS S3 Buckets, AWS Access Key ID, AWS Secret Access Key, and AWS Session Token.

Returns - A Pandas DataFrame containing the data from the file.

_save_data

This method saves data into an AWS S3 bucket.

def _save_data(self, data, path, config, *args, **kwargs)

Arguments - data: The data to be written into the bucket. - path: The location of the file in the bucket. - config: A dictionary that stores AWS S3 Buckets, AWS Access Key ID, AWS Secret Access Key, and AWS Session Token.

_get_client

This method returns the AWS S3 client.

def _get_client(self, config)

Arguments - config: A dictionary that stores AWS S3 Buckets, AWS Access Key ID, AWS Secret Access Key, and AWS Session Token.

Returns - An AWS S3 client.

Usage

from lolpop.component import S3DataConnector

config = {
    #insert component config here
}

s3 = S3DataConnector(conf=config)

data = s3.get_data('/path/to/data.csv')

new_data = pd.DataFrame({"col1": [1, 2, 3], "col2": ["a", "b", "c"]})

s3.save_data(new_data, '/path/to/new_data.csv')