twarc2sql.db_utils package¶
Subpackages¶
Submodules¶
twarc2sql.db_utils.db_access module¶
Module for accessing the database.
Module contains functions for creating and deleting databases & their tables as defined in models.py.
- exception twarc2sql.db_utils.db_access.DatabaseException(message: str)[source]¶
Bases:
ExceptionException for database errors.
- twarc2sql.db_utils.db_access.create_db(config_file_path: str | None = None) Engine[source]¶
Create a database if it does not already exist with the specified name.
- Parameters:
config_file_path (Optional[str], optional) – Path to the .env file. If None, defaults to .env in the current directory, by default None
- Returns:
engine – SQLAlchemy engine for the database created
- Return type:
sa.engine.base.Engine
- Raises:
DatabaseException – if the database already exists
sa.exc.OperationalError – if the database could not be created
- twarc2sql.db_utils.db_access.create_db_with_tables(config_file_path: str | None = None) Engine[source]¶
Create a database and create the tables for the database.
This is a wrapper function for create_db and create_tables functions
- Parameters:
config_file_path (Optional[str], optional) – Path to the .env or equivalent file. If None, defaults to .env in the current directory.
- Returns:
engine – SQLAlchemy engine for the database created
- Return type:
sa.engine.base.Engine
- twarc2sql.db_utils.db_access.create_engine(uri: str) Engine[source]¶
Create a SQLAlchemy engine for the database specified by the URI.
- Parameters:
uri (str) – URI for the database to connect to or create
- Returns:
engine – SQLAlchemy engine for the database
- Return type:
sa.engine.base.Engine
- twarc2sql.db_utils.db_access.create_tables(engine: <module 'sqlalchemy.engine' from '/home/docs/checkouts/readthedocs.org/user_builds/twarc2sql/envs/latest/lib/python3.11/site-packages/sqlalchemy/engine/__init__.py'>, base: ~typing.Any) None[source]¶
Create the tables for the database.
- Parameters:
engine (sa.engine) – SQLAlchemy engine for the database
base (sqlalchemy.ext.declarative.api.DeclarativeMeta) – Base class for the database schema
- Return type:
None
- twarc2sql.db_utils.db_access.create_uri(db_name: str, db_user: str, db_password: str, db_host: str, db_port: str) str[source]¶
Create a URI for a database connection using the specified parameters.
- Parameters:
db_name (str) – the name of the database to connect to or create
db_user (str) – the username for authentication to connect to the database
db_password (str) – the password for authentication to connect to the database
db_host (str) – the host of the database to connect to
db_port (str) – the port of the database to connect to
- Returns:
uri – the URI for the database
- Return type:
str
- twarc2sql.db_utils.db_access.delete_db(config_file_path: str | None = None) bool[source]¶
Delete a database.
if it exists with the specified name or do nothing if it does not exist.
Note
This is not a guarantee that the database was deleted. The database may not exist in the first place.
- Parameters:
config_file_path (Optional[str], optional) – Path to the .env file. If None, defaults to .env in the current directory, by default None
- Returns:
db_does_not_exist – True if the database does not exist, False otherwise
- Return type:
bool
- twarc2sql.db_utils.db_access.get_engine(config_file_path: str | None = None) Engine[source]¶
Get the SQLAlchemy engine for the database.
- Parameters:
config_file_path (Optional[str], optional) – Path to the .env or equivalent file. If None, defaults to .env in the current directory.
- Returns:
engine – SQLAlchemy engine for the database
- Return type:
sa.engine.base.Engine
- Raises:
DatabaseException – if the database does not exist
- twarc2sql.db_utils.db_access.load_db_config(file_path: str | None = None) Dict[str, str][source]¶
Load env variables from file_path and return a dictionary of the database variables.
- Parameters:
file_path (Optional[str], optional) – Path to the .env file. If None, defaults to .env in the current directory, by default None
- Returns:
db_variables – Dictionary of the database variables
- Return type:
Dict[str, str]
- Raises:
AssertionError : – if the environment file specified does not exist or if the environment variables are not set
twarc2sql.db_utils.models module¶
The SQLAlchemy models for creating the database tables. The models are based on the Twitter API v2 Tweet object. It is an opnionated normalization of the Tweet object to make it easier to query the database.
- class twarc2sql.db_utils.models.Annonation_Tweet_Mapping(**kwargs)[source]¶
Bases:
Base- end¶
end index of the annotation in the tweet
- id¶
The unique identifier of the annotation
- normalized_text¶
normalized text of the annotation in the tweet
- probability¶
probability of the annotation in the tweet
- start¶
start index of the annotation in the tweet
- table_args = (UniqueConstraint(),)¶
- tweet_id¶
The unique identifier of the tweet
- type¶
type of the annotation in the tweet
- class twarc2sql.db_utils.models.Author(**kwargs)[source]¶
Bases:
Base- property clean_text¶
Remove newlines from the tweet text.
- created_at¶
The date and time when the author was created
- description¶
The description of the author
- followers_count¶
The number of followers of the author
- following_count¶
The number of accounts the author is following
- id¶
The unique identifier of the author
- listed_count¶
The number of lists the author is in
- location¶
The location of the author
- name¶
The name of the author
- profile_image_url¶
The profile image url of the author
- protected¶
Whether the author is protected
- tweet_count¶
The number of tweets of the author
- url¶
The url of the author
- username¶
The username of the author
- verified¶
Whether the author is verified
- class twarc2sql.db_utils.models.Castag_Tweet_Mapping(**kwargs)[source]¶
Bases:
Base- end¶
end index of the cashtag in the tweet
- id¶
The unique identifier of the castag
- start¶
start index of the cashtag in the tweet
- table_args = (UniqueConstraint(),)¶
- tag¶
cashtag in the tweet
- tweet_id¶
The unique identifier of the tweet
- class twarc2sql.db_utils.models.Hastag_Tweet_Mapping(**kwargs)[source]¶
Bases:
Base- end¶
end index of the hashtag in the tweet
- id¶
The unique identifier of the hastag
- start¶
start index of the hashtag in the tweet
- table_args = (UniqueConstraint(),)¶
- tag¶
hashtag in the tweet
- tweet_id¶
The unique identifier of the tweet
- class twarc2sql.db_utils.models.Mention_Tweet_Mapping(**kwargs)[source]¶
Bases:
Base- author_id¶
id of the user mentioned in the tweet
- end¶
end index of the username in the tweet
- id¶
The unique identifier of the mention
- start¶
start index of the username in the tweet
- table_args = (UniqueConstraint(),)¶
- tweet_id¶
The unique identifier of the tweet
- username¶
username in the tweet
- class twarc2sql.db_utils.models.Quoted_Tweet_Mapping(**kwargs)[source]¶
Bases:
Base- id¶
The unique identifier of the quote
- tweet_id¶
The unique identifier of the tweet being quoted
- class twarc2sql.db_utils.models.Replied_Tweet_Mapping(**kwargs)[source]¶
Bases:
Base- id¶
The unique identifier of the reply
- in_reply_to_user_id¶
The unique identifier of the user being replied to
- tweet_id¶
The unique identifier of the tweet being replied to
- class twarc2sql.db_utils.models.Retweet_Tweet_Mapping(**kwargs)[source]¶
Bases:
Base- id¶
The unique identifier of the retweet
- tweet_id¶
The unique identifier of the tweet being retweeted
- class twarc2sql.db_utils.models.Tweet(**kwargs)[source]¶
Bases:
Base- author¶
- author_id¶
The author of the tweet
- property clean_text¶
Remove newlines from the tweet text.
- conversation_id¶
The conversation id of the tweet
- created_at¶
The date and time when the tweet was created
- id¶
The unique identifier of the tweet
- impression_count¶
The number of times the tweet was viewed
- in_reply_to_user_id¶
The user id of the user the tweet is replying to
- lang¶
The language of the tweet
- like_count¶
The number of times the tweet was liked
- possibly_sensitive¶
Whether the tweet is possibly sensitive
- quote_count¶
The number of times the tweet was quoted
- reply_count¶
The number of times the tweet was replied to
- reply_settings¶
The reply settings of the tweet
- retweet_count¶
The number of times the tweet was retweeted
- text¶
The text of the tweet
- tweet_type¶
The type of the tweet 0: original, 1: quote tweet, 2: retweeted tweet, 3: reply, 4: quoted tweet + replied to tweet
- class twarc2sql.db_utils.models.Url_Tweet_Mapping(**kwargs)[source]¶
Bases:
Base- display_url¶
display url in the tweet
- end¶
end index of the url in the tweet
- expanded_url¶
expanded url in the tweet
- id¶
The unique identifier of the url
- media_key¶
media key in the tweet
- start¶
start index of the url in the tweet
- table_args = (UniqueConstraint(),)¶
- tweet_id¶
The unique identifier of the tweet
- url¶
url in the tweet
Module contents¶
Module contains functions for creating & interacting with a database.