twarc2sql.db_utils package

Subpackages

Submodules

twarc2sql.db_utils.db_access module

Module for accessing the database.

Module contains functions for creating and deleting databases & their tables as defined in models.py.

exception twarc2sql.db_utils.db_access.DatabaseException(message: str)[source]

Bases: Exception

Exception for database errors.

twarc2sql.db_utils.db_access.create_db(config_file_path: str | None = None) Engine[source]

Create a database if it does not already exist with the specified name.

Parameters:

config_file_path (Optional[str], optional) – Path to the .env file. If None, defaults to .env in the current directory, by default None

Returns:

engine – SQLAlchemy engine for the database created

Return type:

sa.engine.base.Engine

Raises:
  • DatabaseException – if the database already exists

  • sa.exc.OperationalError – if the database could not be created

twarc2sql.db_utils.db_access.create_db_with_tables(config_file_path: str | None = None) Engine[source]

Create a database and create the tables for the database.

This is a wrapper function for create_db and create_tables functions

Parameters:

config_file_path (Optional[str], optional) – Path to the .env or equivalent file. If None, defaults to .env in the current directory.

Returns:

engine – SQLAlchemy engine for the database created

Return type:

sa.engine.base.Engine

twarc2sql.db_utils.db_access.create_engine(uri: str) Engine[source]

Create a SQLAlchemy engine for the database specified by the URI.

Parameters:

uri (str) – URI for the database to connect to or create

Returns:

engine – SQLAlchemy engine for the database

Return type:

sa.engine.base.Engine

twarc2sql.db_utils.db_access.create_tables(engine: <module 'sqlalchemy.engine' from '/home/docs/checkouts/readthedocs.org/user_builds/twarc2sql/envs/latest/lib/python3.11/site-packages/sqlalchemy/engine/__init__.py'>, base: ~typing.Any) None[source]

Create the tables for the database.

Parameters:
  • engine (sa.engine) – SQLAlchemy engine for the database

  • base (sqlalchemy.ext.declarative.api.DeclarativeMeta) – Base class for the database schema

Return type:

None

twarc2sql.db_utils.db_access.create_uri(db_name: str, db_user: str, db_password: str, db_host: str, db_port: str) str[source]

Create a URI for a database connection using the specified parameters.

Parameters:
  • db_name (str) – the name of the database to connect to or create

  • db_user (str) – the username for authentication to connect to the database

  • db_password (str) – the password for authentication to connect to the database

  • db_host (str) – the host of the database to connect to

  • db_port (str) – the port of the database to connect to

Returns:

uri – the URI for the database

Return type:

str

twarc2sql.db_utils.db_access.delete_db(config_file_path: str | None = None) bool[source]

Delete a database.

if it exists with the specified name or do nothing if it does not exist.

Note

This is not a guarantee that the database was deleted. The database may not exist in the first place.

Parameters:

config_file_path (Optional[str], optional) – Path to the .env file. If None, defaults to .env in the current directory, by default None

Returns:

db_does_not_exist – True if the database does not exist, False otherwise

Return type:

bool

twarc2sql.db_utils.db_access.get_engine(config_file_path: str | None = None) Engine[source]

Get the SQLAlchemy engine for the database.

Parameters:

config_file_path (Optional[str], optional) – Path to the .env or equivalent file. If None, defaults to .env in the current directory.

Returns:

engine – SQLAlchemy engine for the database

Return type:

sa.engine.base.Engine

Raises:

DatabaseException – if the database does not exist

twarc2sql.db_utils.db_access.load_db_config(file_path: str | None = None) Dict[str, str][source]

Load env variables from file_path and return a dictionary of the database variables.

Parameters:

file_path (Optional[str], optional) – Path to the .env file. If None, defaults to .env in the current directory, by default None

Returns:

db_variables – Dictionary of the database variables

Return type:

Dict[str, str]

Raises:

AssertionError : – if the environment file specified does not exist or if the environment variables are not set

twarc2sql.db_utils.models module

The SQLAlchemy models for creating the database tables. The models are based on the Twitter API v2 Tweet object. It is an opnionated normalization of the Tweet object to make it easier to query the database.

class twarc2sql.db_utils.models.Annonation_Tweet_Mapping(**kwargs)[source]

Bases: Base

end

end index of the annotation in the tweet

id

The unique identifier of the annotation

normalized_text

normalized text of the annotation in the tweet

probability

probability of the annotation in the tweet

start

start index of the annotation in the tweet

table_args = (UniqueConstraint(),)
tweet_id

The unique identifier of the tweet

type

type of the annotation in the tweet

class twarc2sql.db_utils.models.Author(**kwargs)[source]

Bases: Base

property clean_text

Remove newlines from the tweet text.

created_at

The date and time when the author was created

description

The description of the author

followers_count

The number of followers of the author

following_count

The number of accounts the author is following

id

The unique identifier of the author

listed_count

The number of lists the author is in

location

The location of the author

name

The name of the author

profile_image_url

The profile image url of the author

protected

Whether the author is protected

tweet_count

The number of tweets of the author

url

The url of the author

username

The username of the author

verified

Whether the author is verified

class twarc2sql.db_utils.models.Castag_Tweet_Mapping(**kwargs)[source]

Bases: Base

end

end index of the cashtag in the tweet

id

The unique identifier of the castag

start

start index of the cashtag in the tweet

table_args = (UniqueConstraint(),)
tag

cashtag in the tweet

tweet_id

The unique identifier of the tweet

class twarc2sql.db_utils.models.Hastag_Tweet_Mapping(**kwargs)[source]

Bases: Base

end

end index of the hashtag in the tweet

id

The unique identifier of the hastag

start

start index of the hashtag in the tweet

table_args = (UniqueConstraint(),)
tag

hashtag in the tweet

tweet_id

The unique identifier of the tweet

class twarc2sql.db_utils.models.Mention_Tweet_Mapping(**kwargs)[source]

Bases: Base

author_id

id of the user mentioned in the tweet

end

end index of the username in the tweet

id

The unique identifier of the mention

start

start index of the username in the tweet

table_args = (UniqueConstraint(),)
tweet_id

The unique identifier of the tweet

username

username in the tweet

class twarc2sql.db_utils.models.Quoted_Tweet_Mapping(**kwargs)[source]

Bases: Base

id

The unique identifier of the quote

tweet_id

The unique identifier of the tweet being quoted

class twarc2sql.db_utils.models.Replied_Tweet_Mapping(**kwargs)[source]

Bases: Base

id

The unique identifier of the reply

in_reply_to_user_id

The unique identifier of the user being replied to

tweet_id

The unique identifier of the tweet being replied to

class twarc2sql.db_utils.models.Retweet_Tweet_Mapping(**kwargs)[source]

Bases: Base

id

The unique identifier of the retweet

tweet_id

The unique identifier of the tweet being retweeted

class twarc2sql.db_utils.models.Tweet(**kwargs)[source]

Bases: Base

author
author_id

The author of the tweet

property clean_text

Remove newlines from the tweet text.

conversation_id

The conversation id of the tweet

created_at

The date and time when the tweet was created

id

The unique identifier of the tweet

impression_count

The number of times the tweet was viewed

in_reply_to_user_id

The user id of the user the tweet is replying to

lang

The language of the tweet

like_count

The number of times the tweet was liked

possibly_sensitive

Whether the tweet is possibly sensitive

quote_count

The number of times the tweet was quoted

reply_count

The number of times the tweet was replied to

reply_settings

The reply settings of the tweet

retweet_count

The number of times the tweet was retweeted

text

The text of the tweet

tweet_type

The type of the tweet 0: original, 1: quote tweet, 2: retweeted tweet, 3: reply, 4: quoted tweet + replied to tweet

class twarc2sql.db_utils.models.Url_Tweet_Mapping(**kwargs)[source]

Bases: Base

display_url

display url in the tweet

end

end index of the url in the tweet

expanded_url

expanded url in the tweet

id

The unique identifier of the url

media_key

media key in the tweet

start

start index of the url in the tweet

table_args = (UniqueConstraint(),)
tweet_id

The unique identifier of the tweet

url

url in the tweet

Module contents

Module contains functions for creating & interacting with a database.