mirror of https://github.com/EstrellaXD/Auto_Bangumi.git synced 2026-02-04 10:43:52 +08:00

Files

EstrellaXD 1f5d92f50b docs(dev): add database developer guide

Comprehensive documentation covering:
- Database architecture and components
- Model schemas (Bangumi, RSSItem, Torrent, User)
- Common CRUD operations for each sub-database
- Caching strategy and invalidation
- Migration system and how to add new migrations
- Performance patterns (batch queries, regex matching, indexes)
- Testing setup with factories
- Common issues and solutions

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-26 14:51:50 +01:00

11 KiB

Raw Blame History

Database Developer Guide

This guide covers the database architecture, models, and operations in AutoBangumi.

Overview

AutoBangumi uses SQLite as its database with SQLModel (Pydantic + SQLAlchemy hybrid) for ORM. The database file is located at data/data.db.

Architecture

module/database/
├── engine.py       # SQLAlchemy engine configuration
├── combine.py      # Database class, migrations, session management
├── bangumi.py      # Bangumi (anime subscription) operations
├── rss.py          # RSS feed operations
├── torrent.py      # Torrent tracking operations
└── user.py         # User authentication operations

Core Components

Database Class

The Database class in combine.py is the main entry point. It inherits from SQLModel's Session and provides access to all sub-databases:

from module.database import Database

with Database() as db:
    # Access sub-databases
    bangumis = db.bangumi.search_all()
    rss_items = db.rss.search_active()
    torrents = db.torrent.search_all()

Sub-Database Classes

Class	Model	Purpose
`BangumiDatabase`	`Bangumi`	Anime subscription rules
`RSSDatabase`	`RSSItem`	RSS feed sources
`TorrentDatabase`	`Torrent`	Downloaded torrent tracking
`UserDatabase`	`User`	Authentication

Models

Bangumi Model

Core model for anime subscriptions:

class Bangumi(SQLModel, table=True):
    id: int                          # Primary key
    official_title: str              # Display name (e.g., "Mushoku Tensei")
    title_raw: str                   # Raw title for torrent matching (indexed)
    season: int = 1                  # Season number
    episode_offset: int = 0          # Episode numbering adjustment
    season_offset: int = 0           # Season numbering adjustment
    rss_link: str                    # Comma-separated RSS feed URLs
    filter: str                      # Exclusion filter (e.g., "720,\\d+-\\d+")
    poster_link: str                 # TMDB poster URL
    save_path: str                   # Download destination path
    rule_name: str                   # qBittorrent RSS rule name
    added: bool = False              # Whether rule is added to downloader
    deleted: bool = False            # Soft delete flag (indexed)
    archived: bool = False           # For completed series (indexed)
    needs_review: bool = False       # Offset mismatch detected
    needs_review_reason: str         # Reason for review
    suggested_season_offset: int     # Suggested season offset
    suggested_episode_offset: int    # Suggested episode offset
    air_weekday: int                 # Airing day (0=Sunday, 6=Saturday)

RSSItem Model

RSS feed subscriptions:

class RSSItem(SQLModel, table=True):
    id: int                          # Primary key
    name: str                        # Display name
    url: str                         # Feed URL (unique, indexed)
    aggregate: bool = True           # Whether to parse torrents
    parser: str = "mikan"            # Parser type: mikan, dmhy, nyaa
    enabled: bool = True             # Active flag
    connection_status: str           # "healthy" or "error"
    last_checked_at: str             # ISO timestamp
    last_error: str                  # Last error message

Torrent Model

Tracks downloaded torrents:

class Torrent(SQLModel, table=True):
    id: int                          # Primary key
    name: str                        # Torrent name (indexed)
    url: str                         # Torrent/magnet URL (unique, indexed)
    rss_id: int                      # Source RSS feed ID
    bangumi_id: int                  # Linked Bangumi ID (nullable)
    qb_hash: str                     # qBittorrent info hash (indexed)
    downloaded: bool = False         # Download completed

Common Operations

BangumiDatabase

with Database() as db:
    # Create
    db.bangumi.add(bangumi)              # Single insert
    db.bangumi.add_all(bangumi_list)     # Batch insert (deduplicates)

    # Read
    db.bangumi.search_all()              # All records (cached, 5min TTL)
    db.bangumi.search_id(123)            # By ID
    db.bangumi.match_torrent("torrent name")  # Find by title_raw match
    db.bangumi.not_complete()            # Incomplete series
    db.bangumi.get_needs_review()        # Flagged for review

    # Update
    db.bangumi.update(bangumi)           # Update single record
    db.bangumi.update_all(bangumi_list)  # Batch update

    # Delete
    db.bangumi.delete_one(123)           # Hard delete
    db.bangumi.disable_rule(123)         # Soft delete (deleted=True)

RSSDatabase

with Database() as db:
    # Create
    db.rss.add(rss_item)                 # Single insert
    db.rss.add_all(rss_items)            # Batch insert (deduplicates)

    # Read
    db.rss.search_all()                  # All feeds
    db.rss.search_active()               # Enabled feeds only
    db.rss.search_aggregate()            # Enabled + aggregate=True

    # Update
    db.rss.update(id, rss_update)        # Partial update
    db.rss.enable(id)                    # Enable feed
    db.rss.disable(id)                   # Disable feed
    db.rss.enable_batch([1, 2, 3])       # Batch enable
    db.rss.disable_batch([1, 2, 3])      # Batch disable

TorrentDatabase

with Database() as db:
    # Create
    db.torrent.add(torrent)              # Single insert
    db.torrent.add_all(torrents)         # Batch insert

    # Read
    db.torrent.search_all()              # All torrents
    db.torrent.search_by_qb_hash(hash)   # By qBittorrent hash
    db.torrent.search_by_url(url)        # By URL
    db.torrent.check_new(torrents)       # Filter out existing

    # Update
    db.torrent.update_qb_hash(id, hash)  # Set qb_hash

Caching

Bangumi Cache

search_all() results are cached at the module level with a 5-minute TTL:

# Module-level cache in bangumi.py
_bangumi_cache: list[Bangumi] | None = None
_bangumi_cache_time: float = 0
_BANGUMI_CACHE_TTL: float = 300.0  # 5 minutes

# Cache invalidation
def _invalidate_bangumi_cache():
    global _bangumi_cache, _bangumi_cache_time
    _bangumi_cache = None
    _bangumi_cache_time = 0

Important: The cache is automatically invalidated on:

add(), add_all()
update(), update_all()
delete_one(), delete_all()
archive_one(), unarchive_one()
Any RSS link update operations

Session Expunge

Cached objects are expunged from the session to prevent DetachedInstanceError:

for b in bangumis:
    self.session.expunge(b)  # Detach from session

Migration System

Schema Versioning

Migrations are tracked via a schema_version table:

CURRENT_SCHEMA_VERSION = 7

# Each migration: (version, description, [SQL statements])
MIGRATIONS = [
    (1, "add air_weekday column", [...]),
    (2, "add connection status columns", [...]),
    (3, "create passkey table", [...]),
    (4, "add archived column", [...]),
    (5, "rename offset to episode_offset", [...]),
    (6, "add qb_hash column", [...]),
    (7, "add suggested offset columns", [...]),
]

Adding a New Migration

Increment CURRENT_SCHEMA_VERSION in combine.py
Add migration tuple to MIGRATIONS list:

MIGRATIONS = [
    # ... existing migrations ...
    (
        8,
        "add my_new_column to bangumi",
        [
            "ALTER TABLE bangumi ADD COLUMN my_new_column TEXT DEFAULT NULL",
        ],
    ),
]

Add idempotency check in run_migrations():

if "bangumi" in tables and version == 8:
    columns = [col["name"] for col in inspector.get_columns("bangumi")]
    if "my_new_column" in columns:
        needs_run = False

Update the corresponding Pydantic model in module/models/

Default Value Backfill

After migrations, _fill_null_with_defaults() automatically fills NULL values based on model defaults:

# If model defines:
class Bangumi(SQLModel, table=True):
    my_field: bool = False

# Then existing rows with NULL will be updated to False

Performance Patterns

Batch Queries

add_all() uses a single query to check for duplicates instead of N queries:

# Efficient: single SELECT
keys_to_check = [(d.title_raw, d.group_name) for d in datas]
conditions = [
    and_(Bangumi.title_raw == tr, Bangumi.group_name == gn)
    for tr, gn in keys_to_check
]
statement = select(Bangumi.title_raw, Bangumi.group_name).where(or_(*conditions))

Regex Matching

match_list() compiles a single regex pattern for all title matches:

# Compile once, match many
sorted_titles = sorted(title_index.keys(), key=len, reverse=True)
pattern = "|".join(re.escape(title) for title in sorted_titles)
title_regex = re.compile(pattern)

# O(1) lookup per torrent instead of O(n)
for torrent in torrent_list:
    match = title_regex.search(torrent.name)

Indexed Columns

The following columns have indexes for fast lookups:

Table	Column	Index Type
`bangumi`	`title_raw`	Regular
`bangumi`	`deleted`	Regular
`bangumi`	`archived`	Regular
`rssitem`	`url`	Unique
`torrent`	`name`	Regular
`torrent`	`url`	Unique
`torrent`	`qb_hash`	Regular

Testing

Test Database Setup

Tests use an in-memory SQLite database:

# conftest.py
@pytest.fixture
def db_engine():
    engine = create_engine("sqlite:///:memory:")
    SQLModel.metadata.create_all(engine)
    yield engine
    engine.dispose()

@pytest.fixture
def db_session(db_engine):
    with Session(db_engine) as session:
        yield session

Factory Functions

Use factory functions for creating test data:

from test.factories import make_bangumi, make_torrent, make_rss_item

def test_bangumi_search():
    bangumi = make_bangumi(title_raw="Test Title", season=2)
    # ... test logic

Design Notes

No Foreign Keys

SQLite foreign key enforcement is disabled by default. Relationships (like Torrent.bangumi_id) are managed in application logic rather than database constraints.

Soft Deletes

The Bangumi.deleted flag enables soft deletes. Queries should filter by deleted=False for user-facing data:

statement = select(Bangumi).where(Bangumi.deleted == false())

Torrent Tagging

Torrents are tagged in qBittorrent with ab:{bangumi_id} for offset lookup during rename operations. This enables fast bangumi identification without database queries.

Common Issues

DetachedInstanceError

If you access cached objects from a different session:

# Wrong: accessing cached object in new session
bangumis = db.bangumi.search_all()  # Cached
with Database() as new_db:
    new_db.session.add(bangumis[0])  # Error!

# Right: objects are expunged, work independently
bangumis = db.bangumi.search_all()
bangumis[0].title_raw = "New Title"  # OK, but won't persist

Cache Staleness

If manual SQL updates bypass the ORM, invalidate the cache:

from module.database.bangumi import _invalidate_bangumi_cache

with engine.connect() as conn:
    conn.execute(text("UPDATE bangumi SET ..."))
    conn.commit()

_invalidate_bangumi_cache()  # Important!

11 KiB Raw Blame History