Layer 1 — Entry Points / Interfaces
cli/ · scripts/
OUTPUT → logs/gmre-core.log + exit code
cli/main.py — User Entrypoint
Interactive & non-interactive modes. Argument parsing, delegates to input collection, invokes pipeline. Catches exit codes and writes final summary.
↳cli/input.py — collect ticket / customer / country / network / HC-ID via prompts or flags
↳cli/services/db_suggest.py — live DB autocomplete for customer & network selection
→cli/logs/gmre-core.log (structured JSON)
scripts/ — Batch, Maintenance & Utility Scripts
backfill_or_ingest_by_tree.py
Walk a directory tree, ingest every valid archive found recursively
BATCH INGESTingest_ticket_folders.py
Per-ticket folder ingestion — maps folder names to ticket metadata
BATCH INGESTdry_run_archive.py
Run reader + parsers, print records — zero DB writes. Safe dev tool.
DRY RUNdry_run_nea_buckets.py
NEA-specific dry run; inspect bucket count extraction only
DRY RUNbackfill_plugins.py / backfill_plugins1.py
Re-run new plugins against already-ingested dumps (no re-parse of node/DPR)
BACKFILLdb_check.py
Validate DB connectivity & schema sanity checks against live DB
DB TOOLdb_migrate.py / db_migrate2.py / db_reset.py
Alembic migration runners + hard DB reset for dev environments
DB TOOLsync_customers_from_excel.py
Read Nokia customer Excel sheet, upsert into customers_db table
SYNC UTIL① invoke pipeline(config, archive_path, ticket_params)
Layer 2 — Application / Orchestration
app/run_pipeline_v2.py · app/service.py
returns RunResult(ok, inserted, errors, run_id)
run_pipeline_v2.py — Main Pipeline Coordinator (7 sequential stages)
a
Config & Logging
app_config.py
logging_setup.py
config/env.py
logging_setup.py
config/env.py
STARTUP
b
Inspect Archive
archives/plan.py
archives/timestamp.py
→ ArchivePlan
archives/timestamp.py
→ ArchivePlan
PLAN
c
Stream Members
archives/reader.py
NO full extract
spool limits
NO full extract
spool limits
STREAM
d
Match Plugins
parse/registry.py
parse/targets.py
overlap contract
parse/targets.py
overlap contract
DISPATCH
e
Run Plugins
parse/engine.py
plugin.parse(stream)
parse/spool.py
plugin.parse(stream)
parse/spool.py
PARSE
f
Route Records
parse/router.py
domain/records.py
normalize + validate
domain/records.py
normalize + validate
ROUTE
g
Persist + Dedup
DB/inserter.py
sha256 + ledger
sessions + models
sha256 + ledger
sessions + models
COMMIT
⚡ Stream-Read Design
Archives are never fully extracted to disk.
reader.py yields file members as IO streams.
spool.py enforces byte limits per member.
Memory-safe for >5GB nested .tgz files.
reader.py yields file members as IO streams.
spool.py enforces byte limits per member.
Memory-safe for >5GB nested .tgz files.
🔒 SHA-256 Fingerprinting
Every archive is fingerprinted before pipeline runs.
fingerprint.py → archive_sha256 DB table.
Exact duplicate archives are skipped instantly.
Re-runs are safe — zero double-inserts.
fingerprint.py → archive_sha256 DB table.
Exact duplicate archives are skipped instantly.
Re-runs are safe — zero double-inserts.
📋 Plugin Ledger State Machine
Each plugin's result is tracked per-dump in plugin_ledger.
States: PENDING → RUNNING → DONE / FAILED.
Enables partial backfill: add a new plugin, re-run only that plugin against all existing dumps.
States: PENDING → RUNNING → DONE / FAILED.
Enables partial backfill: add a new plugin, re-run only that plugin against all existing dumps.
🛡️ Guardrail System
Same-day snapshot guards (6h window).
Overwrite purges child records atomically.
HC-ID overwrite policy matrix enforced.
DB unique constraints as final safety net.
Overwrite purges child records atomically.
HC-ID overwrite policy matrix enforced.
DB unique constraints as final safety net.
② typed domain records + archive streams + ParseContext
Layer 3 — Domain
domain/
domain/records.py
Typed Python dataclasses for every record kind. Pure data — no I/O.
·NodeRecord
·LspRecord
·NeaBucketRecord
·MaNotifRecord
·AlarmRecord
·DprRecord / 3RRecord
domain/dto.py
DTOs crossing layer boundaries.
·RunConfig
·ArchiveMeta
·ParseContext
·InsertResult
Normalization
✓ version string normalization
✓ missing mgmt_ip fallback
✓ router_id consistency
✓ NE identity dedup key
Layer 4 — Infrastructure
infra/
4.1 Archive Handling
archives/plan.py
inspect_archive() → ArchivePlan. Determines which inner archive chain to use.
·chain / extra_chains detection
·ECS81 over non-ECS81 preference
·ACT-preferred over STDBY
·active_only_fallback logic
·depth-limited nested .tgz walk
archives/reader.py
Stream-first design — yields IO streams for each matched member, never writes to disk.
·iter_streams() lazy generator
·spool limits per-member
·path-aware basename matching
·Windows backslash path handling
·corrupt inner member recovery
·case-insensitive filename match
·multi-target single-pass read
archives/timestamp.py
Two-strategy timestamp inference — name hints win, log hints as fallback.
1.Name: datetime regex on filename
2.Log: first-line log timestamp scan
4.2 Parsing & Plugin Framework
plugin_api.py
PluginBase ABC: name, targets[], priority, parse(stream) → List[Record]
registry.py
match_plugins(basename) → ordered list. Overlap contract enforced at registration.
engine.py
Drives each plugin over its stream. Collects ParseResult list.
router.py
Routes typed records to correct inserter method by record type discriminator.
targets.py · spool.py
File pattern registry. Stream spooling with configurable byte limit safety valve.
Low-level Parsers
parsers.py (base)
parsers_nea.py
parsers_ma.py
parsers_3r.py
Plugin Registry (12 registered plugins)
node_plugin.py
Parse Node state: version, mgmt_ip, router_id, NE identity
CLI_Show_Node.out
CORE
dpr_db_plugin.py
DPR database table extraction and structured record mapping
CLI_Show_DPR_DB.out
CORE
plugins/lsp.py
LSP entry ingestion with unique constraint (dump×NE×from×to×label)
CLI_Show_LSP.out
PLUGIN
plugins/nea_trace.py
NEA trace file parsing — extract trace events per timestamp
NEA trace files
PLUGIN
plugins/nea_bucket_counts.py
NEA bucket counter ingestion for capacity & utilization analysis
NEA bucket files
PLUGIN
plugins/ma_notif_bucket_counts.py
MA notification bucket counters — maps to ma_notif_bucket_count table
MA notif files
PLUGIN
plugins/alarm_state.py
Active alarm state snapshot ingestion per CaptureDump
alarm state output
PLUGIN
plugins/dpr_3r_plugin.py
3R group extraction — redundancy group membership per NE
DPR 3R output
PLUGIN
plugins/dpr_links.py
DPR inter-NE link mapping — source/dest/type per CaptureDump
DPR link file
PLUGIN
plugins/dpr_router_map.py
Router-ID to NE mapping table for cross-reference resolution
DPR router map
PLUGIN
infra/parse/alarm_state.py
Root-level alarm parser (legacy path, kept for backward compat)
alarm_state (root)
LEGACY
dummy_plugin.py ⚗️
Null plugin — test scaffold, used to verify registry & engine plumbing
testing only
TEST ONLY
4.3 Config + IO + Observability
infra/config/env.py
Loads .env / environment. Provides DB_URL, LOG_LEVEL, SPOOL_LIMIT, dry_run flag.
infra/app_config.py
AppConfig dataclass. Validated at startup. Controls overwrite, active_only, dry_run flags. Single source of truth for run behaviour.
io/run_cfg.py · legacy_cfg.py
Per-run config. Legacy shim for older CLI invocations — bridges old arg format to AppConfig.
io/customers_db.py
Customer lookup & upsert from DB. Powers db_suggest.py autocomplete. Handles Excel sync.
io/customers_sheet.py
Read Nokia customer records from Excel workbook for initial DB seeding.
io/local_store.py
Local filesystem cache / staging for downloaded or temporary archives.
io/fingerprint.py
Core of idempotency. Computes SHA-256 of archive before any parsing. Writes to archive_sha256 table. If hash seen → skip.
logging_setup.py · timeutils.py
Structured JSON logging to gmre-core.log. ISO timestamp normalization across log events.
③ transactional insert via ORM sessions — commit or rollback
Layer 5 — Data Layer: DB Access + Alembic Migrations + MariaDB
DB/ · db_migrations/ · alembic.ini
DB Access Layer
DB/sessions.py
SQLAlchemy engine factory + scoped session. Connection pool management. Proxy contract tested.
DB/models.py
All ORM table definitions. Declares relationships, unique constraints, cascade rules for purge-children.
DB/inserter.py
Transactional insert with:
• overwrite-purge-children logic
• same-day snapshot guard
• 6h window guard
• HC-ID overwrite matrix
• constraint retry on conflict
• overwrite-purge-children logic
• same-day snapshot guard
• 6h window guard
• HC-ID overwrite matrix
• constraint retry on conflict
Alembic Migrations — 13 Versions (chronological)
| Revision Hash | Migration Description |
|---|---|
| 22925ffe | baseline_schema — Network, NetworkElement, CaptureDump, Customer |
| 99e812d1 | add_customer_table — dedicated Customer entity |
| 2936a356 | add_lsp_entry_table — LSP records per CaptureDump |
| 65bab2a4 | add_3r_group_tables — redundancy group membership |
| 7ffc05d2 | add_nea_bucket_count — NEA utilization counters |
| 8f225311 | add_plugin_ledger_and_archive_sha256 KEY — enables idempotency engine |
| 10e362f0 | add_ingestion_ledger — per-dump ingestion state tracking |
| b59052c9 | add_alarm_state_entry — alarm snapshot table |
| d31a690c | add_ma_notif_bucket_count — MA notification counters |
| 440dd4ff | update_three_r_group_and_r_group_schema — schema refinement |
| fd71ca0a | add_router_id_and_ne_link — extend NE with router_id + link ref |
| d8ae72a0 | fix_lsp_unique_key_and_enforce_router_id KEY — data integrity enforcement |
| d76e01de | drop_old_ingestion_ledger — clean up deprecated table |
MariaDB Schema — Logical Table Groups
Network
Customer
NetworkElement
CaptureDump
Customer
NetworkElement
CaptureDump
Core Entities
lsp_entry
unique: dump × NE
× from × to
× label
unique: dump × NE
× from × to
× label
LSP Table
nea_bucket_count
ma_notif_
bucket_count
alarm_state_entry
ma_notif_
bucket_count
alarm_state_entry
Counter + State
three_r_group
three_r_link
dpr_router_map
nea_trace
three_r_link
dpr_router_map
nea_trace
DPR / 3R Tables
plugin_ledger
ingestion_ledger
archive_sha256
state machine
PENDING→DONE
ingestion_ledger
archive_sha256
state machine
PENDING→DONE
Dedup / Ledger
Test Harness — Full Coverage (55+ test files · htmlcov tracked)
tests/
pytest.ini · conftest.py · htmlcov/
tests/unit/ — 29 files
test_reader_iter_streams
test_reader_corrupt_inner_member
test_reader_early_exit
test_reader_respects_max_depth
test_reader_windows_backslash_paths
test_reader_case_insensitive
test_reader_missing_target
test_reader_multiple_targets_one_pass
test_reader_lines_generator_lifetime
test_registry_contract
test_registry_match_order
test_registry_multi_match
test_registry_single_match
test_plugin_meta_and_overlap_contract
test_ts_from_name
test_ts_hint_priority
test_sha256_file
test_dpr_links_parse
test_dpr_router_map_parse
test_ma_notif_bucket_counts_parse
test_cli_modes_mode1_mode2
test_env_loader
test_backfill_or_ingest_by_tree_deep
test_backfill_run_calls_pipeline
test_backfill_plugins
test_ingest_ticket_folders_batch
test_sessions_proxy_contract
test_customers_sheet_read
test_zero_cov_modules_import_smoke
tests/integration/ — 15 files
test_pipeline_smoke
test_pipeline_db
test_pipeline_uses_chain
test_lsp_ingestion
test_duplicate_sha_policy_integration
test_sha_duplicate_behaviour
test_sha_different_creates_new_dump
test_hc_id_overwrite_policy
test_same_day_new_snapshot
test_same_day_different_origin_snapshot
test_snapshot_6h_guard
test_network_identity_idempotent
test_network_identity_uniqueness
test_backfill_plugins_integration
test_customers_db_upsert
tests/db/ — 11 files
test_constraints
test_constraints_links_and_3r
test_guardrails
test_plugin_ledger_state_machine
test_plugin_db_write_guards
test_overwrite_purge_children
test_sha_duplicate_policy
test_prepare_context_duplicate_policy_matrix
test_multi_plugin_same_file_order_required
test_schema_contract
test_db_bootstrap
tests/fixtures/archives/ — 24 .tgz
example_small.tgz
example_small_v2.tgz
example_small_v2_with_lsp.tgz
example_nested_chain.tgz
example_nested_chain_act_240118.tgz
example_deep_nested_depth8.tgz
example_corrupt_inner.tgz
example_active_only_fallback.tgz
example_act_vs_stdby.tgz
example_stdby_only.tgz
example_ecs81_preferred.tgz
example_case.tgz
origin_a.tgz / origin_b.tgz
📄 tests/golden/golden_case_bharti_2025-01-18.json
📁 make_*.py — fixture generators (8 scripts)
Legend
data / record flow
config / control
domain records
plugin invocation
MariaDB table group
Top → bottom = data flow.
①②③ = pipeline stages.
①②③ = pipeline stages.
⚡ 11-Step Data Flow
1User / script invokes CLI
↓
2cli/main.py · scripts/*
↓
3run_pipeline_v2.py
↓
4fingerprint.py → sha256 check
↓
5plan.py → ArchivePlan
↓
6timestamp.py → capture_ts
↓
7reader.py → stream members
↓
8registry → engine → plugins
↓
9router → domain records
↓
10inserter → MariaDB commit
↓
11logs + RunResult + exit code
⟳ Idempotency & Dedup
sha256 archive fingerprint
archive_sha256 DB table
plugin_ledger state machine
ingestion_ledger per-dump
unique DB constraints
re-runs always safe
archive_sha256 DB table
plugin_ledger state machine
ingestion_ledger per-dump
unique DB constraints
re-runs always safe
🛡️ Guardrail System
overwrite_purge_children
same-day snapshot guard
6-hour window policy
HC-ID overwrite matrix
duplicate policy matrix
schema contract checks
same-day snapshot guard
6-hour window policy
HC-ID overwrite matrix
duplicate policy matrix
schema contract checks
⚠ Error Handling
corrupt inner archive recovery
missing target → skip + log
parse errors → partial result
DB constraint → rollback
multi-plugin ordering enforced
windows path encoding fix
missing target → skip + log
parse errors → partial result
DB constraint → rollback
multi-plugin ordering enforced
windows path encoding fix
🏭 Production Highlights
✓ Stream-read, no disk extract
✓ SHA-256 dedup engine
✓ Plugin ledger state machine
✓ Partial backfill support
✓ 13 tracked migrations
✓ 55+ test files
✓ htmlcov coverage report
✓ Python 3.9 & 3.12 dual compat
✓ Dry-run mode (zero DB risk)
✓ Atomic overwrite-purge-children
✓ ECS81 / ACT preference logic
✓ Depth-limited nested .tgz walk
✓ SHA-256 dedup engine
✓ Plugin ledger state machine
✓ Partial backfill support
✓ 13 tracked migrations
✓ 55+ test files
✓ htmlcov coverage report
✓ Python 3.9 & 3.12 dual compat
✓ Dry-run mode (zero DB risk)
✓ Atomic overwrite-purge-children
✓ ECS81 / ACT preference logic
✓ Depth-limited nested .tgz walk