Auto_Bangumi/docs/dev/database.md

# Database Developer Guide

This guide covers the database architecture, models, and operations in AutoBangumi.

## Overview

AutoBangumi uses **SQLite** as its database with **SQLModel** (Pydantic + SQLAlchemy hybrid) for ORM. The database file is located at `data/data.db`.

### Architecture

```
module/database/
├── engine.py       # SQLAlchemy engine configuration
├── combine.py      # Database class, migrations, session management
├── bangumi.py      # Bangumi (anime subscription) operations
├── rss.py          # RSS feed operations
├── torrent.py      # Torrent tracking operations
└── user.py         # User authentication operations
```

## Core Components

### Database Class

The `Database` class in `combine.py` is the main entry point. It inherits from SQLModel's `Session` and provides access to all sub-databases:

```python
from module.database import Database

with Database() as db:
    # Access sub-databases
    bangumis = db.bangumi.search_all()
    rss_items = db.rss.search_active()
    torrents = db.torrent.search_all()
```

### Sub-Database Classes

| Class | Model | Purpose |
|-------|-------|---------|
| `BangumiDatabase` | `Bangumi` | Anime subscription rules |
| `RSSDatabase` | `RSSItem` | RSS feed sources |
| `TorrentDatabase` | `Torrent` | Downloaded torrent tracking |
| `UserDatabase` | `User` | Authentication |

## Models

### Bangumi Model

Core model for anime subscriptions:

```python
class Bangumi(SQLModel, table=True):
    id: int                          # Primary key
    official_title: str              # Display name (e.g., "Mushoku Tensei")
    title_raw: str                   # Raw title for torrent matching (indexed)
    season: int = 1                  # Season number
    episode_offset: int = 0          # Episode numbering adjustment
    season_offset: int = 0           # Season numbering adjustment
    rss_link: str                    # Comma-separated RSS feed URLs
    filter: str                      # Exclusion filter (e.g., "720,\\d+-\\d+")
    poster_link: str                 # TMDB poster URL
    save_path: str                   # Download destination path
    rule_name: str                   # qBittorrent RSS rule name
    added: bool = False              # Whether rule is added to downloader
    deleted: bool = False            # Soft delete flag (indexed)
    archived: bool = False           # For completed series (indexed)
    needs_review: bool = False       # Offset mismatch detected
    needs_review_reason: str         # Reason for review
    suggested_season_offset: int     # Suggested season offset
    suggested_episode_offset: int    # Suggested episode offset
    air_weekday: int                 # Airing day (0=Sunday, 6=Saturday)
```

### RSSItem Model

RSS feed subscriptions:

```python
class RSSItem(SQLModel, table=True):
    id: int                          # Primary key
    name: str                        # Display name
    url: str                         # Feed URL (unique, indexed)
    aggregate: bool = True           # Whether to parse torrents
    parser: str = "mikan"            # Parser type: mikan, dmhy, nyaa
    enabled: bool = True             # Active flag
    connection_status: str           # "healthy" or "error"
    last_checked_at: str             # ISO timestamp
    last_error: str                  # Last error message
```

### Torrent Model

Tracks downloaded torrents:

```python
class Torrent(SQLModel, table=True):
    id: int                          # Primary key
    name: str                        # Torrent name (indexed)
    url: str                         # Torrent/magnet URL (unique, indexed)
    rss_id: int                      # Source RSS feed ID
    bangumi_id: int                  # Linked Bangumi ID (nullable)
    qb_hash: str                     # qBittorrent info hash (indexed)
    downloaded: bool = False         # Download completed
```

## Common Operations

### BangumiDatabase

```python
with Database() as db:
    # Create
    db.bangumi.add(bangumi)              # Single insert
    db.bangumi.add_all(bangumi_list)     # Batch insert (deduplicates)

    # Read
    db.bangumi.search_all()              # All records (cached, 5min TTL)
    db.bangumi.search_id(123)            # By ID
    db.bangumi.match_torrent("torrent name")  # Find by title_raw match
    db.bangumi.not_complete()            # Incomplete series
    db.bangumi.get_needs_review()        # Flagged for review

    # Update
    db.bangumi.update(bangumi)           # Update single record
    db.bangumi.update_all(bangumi_list)  # Batch update

    # Delete
    db.bangumi.delete_one(123)           # Hard delete
    db.bangumi.disable_rule(123)         # Soft delete (deleted=True)
```

### RSSDatabase

```python
with Database() as db:
    # Create
    db.rss.add(rss_item)                 # Single insert
    db.rss.add_all(rss_items)            # Batch insert (deduplicates)

    # Read
    db.rss.search_all()                  # All feeds
    db.rss.search_active()               # Enabled feeds only
    db.rss.search_aggregate()            # Enabled + aggregate=True

    # Update
    db.rss.update(id, rss_update)        # Partial update
    db.rss.enable(id)                    # Enable feed
    db.rss.disable(id)                   # Disable feed
    db.rss.enable_batch([1, 2, 3])       # Batch enable
    db.rss.disable_batch([1, 2, 3])      # Batch disable
```

### TorrentDatabase

```python
with Database() as db:
    # Create
    db.torrent.add(torrent)              # Single insert
    db.torrent.add_all(torrents)         # Batch insert

    # Read
    db.torrent.search_all()              # All torrents
    db.torrent.search_by_qb_hash(hash)   # By qBittorrent hash
    db.torrent.search_by_url(url)        # By URL
    db.torrent.check_new(torrents)       # Filter out existing

    # Update
    db.torrent.update_qb_hash(id, hash)  # Set qb_hash
```

## Caching

### Bangumi Cache

`search_all()` results are cached at the module level with a 5-minute TTL:

```python
# Module-level cache in bangumi.py
_bangumi_cache: list[Bangumi] | None = None
_bangumi_cache_time: float = 0
_BANGUMI_CACHE_TTL: float = 300.0  # 5 minutes

# Cache invalidation
def _invalidate_bangumi_cache():
    global _bangumi_cache, _bangumi_cache_time
    _bangumi_cache = None
    _bangumi_cache_time = 0
```

**Important:** The cache is automatically invalidated on:
- `add()`, `add_all()`
- `update()`, `update_all()`
- `delete_one()`, `delete_all()`
- `archive_one()`, `unarchive_one()`
- Any RSS link update operations

### Session Expunge

Cached objects are **expunged** from the session to prevent `DetachedInstanceError`:

```python
for b in bangumis:
    self.session.expunge(b)  # Detach from session
```

## Migration System

### Schema Versioning

Migrations are tracked via a `schema_version` table:

```python
CURRENT_SCHEMA_VERSION = 7

# Each migration: (version, description, [SQL statements])
MIGRATIONS = [
    (1, "add air_weekday column", [...]),
    (2, "add connection status columns", [...]),
    (3, "create passkey table", [...]),
    (4, "add archived column", [...]),
    (5, "rename offset to episode_offset", [...]),
    (6, "add qb_hash column", [...]),
    (7, "add suggested offset columns", [...]),
]
```

### Adding a New Migration

1. Increment `CURRENT_SCHEMA_VERSION` in `combine.py`
2. Add migration tuple to `MIGRATIONS` list:

```python
MIGRATIONS = [
    # ... existing migrations ...
    (
        8,
        "add my_new_column to bangumi",
        [
            "ALTER TABLE bangumi ADD COLUMN my_new_column TEXT DEFAULT NULL",
        ],
    ),
]
```

3. Add idempotency check in `run_migrations()`:

```python
if "bangumi" in tables and version == 8:
    columns = [col["name"] for col in inspector.get_columns("bangumi")]
    if "my_new_column" in columns:
        needs_run = False
```

4. Update the corresponding Pydantic model in `module/models/`

### Default Value Backfill

After migrations, `_fill_null_with_defaults()` automatically fills NULL values based on model defaults:

```python
# If model defines:
class Bangumi(SQLModel, table=True):
    my_field: bool = False

# Then existing rows with NULL will be updated to False
```

## Performance Patterns

### Batch Queries

`add_all()` uses a single query to check for duplicates instead of N queries:

```python
# Efficient: single SELECT
keys_to_check = [(d.title_raw, d.group_name) for d in datas]
conditions = [
    and_(Bangumi.title_raw == tr, Bangumi.group_name == gn)
    for tr, gn in keys_to_check
]
statement = select(Bangumi.title_raw, Bangumi.group_name).where(or_(*conditions))
```

### Regex Matching

`match_list()` compiles a single regex pattern for all title matches:

```python
# Compile once, match many
sorted_titles = sorted(title_index.keys(), key=len, reverse=True)
pattern = "|".join(re.escape(title) for title in sorted_titles)
title_regex = re.compile(pattern)

# O(1) lookup per torrent instead of O(n)
for torrent in torrent_list:
    match = title_regex.search(torrent.name)
```

### Indexed Columns

The following columns have indexes for fast lookups:

| Table | Column | Index Type |
|-------|--------|------------|
| `bangumi` | `title_raw` | Regular |
| `bangumi` | `deleted` | Regular |
| `bangumi` | `archived` | Regular |
| `rssitem` | `url` | Unique |
| `torrent` | `name` | Regular |
| `torrent` | `url` | Unique |
| `torrent` | `qb_hash` | Regular |

## Testing

### Test Database Setup

Tests use an in-memory SQLite database:

```python
# conftest.py
@pytest.fixture
def db_engine():
    engine = create_engine("sqlite:///:memory:")
    SQLModel.metadata.create_all(engine)
    yield engine
    engine.dispose()

@pytest.fixture
def db_session(db_engine):
    with Session(db_engine) as session:
        yield session
```

### Factory Functions

Use factory functions for creating test data:

```python
from test.factories import make_bangumi, make_torrent, make_rss_item

def test_bangumi_search():
    bangumi = make_bangumi(title_raw="Test Title", season=2)
    # ... test logic
```

## Design Notes

### No Foreign Keys

SQLite foreign key enforcement is disabled by default. Relationships (like `Torrent.bangumi_id`) are managed in application logic rather than database constraints.

### Soft Deletes

The `Bangumi.deleted` flag enables soft deletes. Queries should filter by `deleted=False` for user-facing data:

```python
statement = select(Bangumi).where(Bangumi.deleted == false())
```

### Torrent Tagging

Torrents are tagged in qBittorrent with `ab:{bangumi_id}` for offset lookup during rename operations. This enables fast bangumi identification without database queries.

## Common Issues

### DetachedInstanceError

If you access cached objects from a different session:

```python
# Wrong: accessing cached object in new session
bangumis = db.bangumi.search_all()  # Cached
with Database() as new_db:
    new_db.session.add(bangumis[0])  # Error!

# Right: objects are expunged, work independently
bangumis = db.bangumi.search_all()
bangumis[0].title_raw = "New Title"  # OK, but won't persist
```

### Cache Staleness

If manual SQL updates bypass the ORM, invalidate the cache:

```python
from module.database.bangumi import _invalidate_bangumi_cache

with engine.connect() as conn:
    conn.execute(text("UPDATE bangumi SET ..."))
    conn.commit()

_invalidate_bangumi_cache()  # Important!
```