NOKIA GMRE · PRODUCTION-GRADE SOFTWARE

GMRE_logparser — Production Architecture (V2)

Nokia GMRE archive ingestion · plugin-based parsing pipeline · MariaDB with idempotency, sha256 dedup & state-machine ledger
490 source files
42 directories
12 plugins
13 DB migrations
55+ test files
Python 3.9 / 3.12
sha256 idempotency
plugin ledger state machine
stream-read · no full extract
Layer 1 — Entry Points / Interfaces
cli/ · scripts/
OUTPUT → logs/gmre-core.log + exit code
cli/main.py — User Entrypoint
Interactive & non-interactive modes. Argument parsing, delegates to input collection, invokes pipeline. Catches exit codes and writes final summary.
cli/input.py — collect ticket / customer / country / network / HC-ID via prompts or flags
cli/services/db_suggest.py — live DB autocomplete for customer & network selection
cli/logs/gmre-core.log (structured JSON)
scripts/ — Batch, Maintenance & Utility Scripts
backfill_or_ingest_by_tree.py
Walk a directory tree, ingest every valid archive found recursively
BATCH INGEST
ingest_ticket_folders.py
Per-ticket folder ingestion — maps folder names to ticket metadata
BATCH INGEST
dry_run_archive.py
Run reader + parsers, print records — zero DB writes. Safe dev tool.
DRY RUN
dry_run_nea_buckets.py
NEA-specific dry run; inspect bucket count extraction only
DRY RUN
backfill_plugins.py / backfill_plugins1.py
Re-run new plugins against already-ingested dumps (no re-parse of node/DPR)
BACKFILL
db_check.py
Validate DB connectivity & schema sanity checks against live DB
DB TOOL
db_migrate.py / db_migrate2.py / db_reset.py
Alembic migration runners + hard DB reset for dev environments
DB TOOL
sync_customers_from_excel.py
Read Nokia customer Excel sheet, upsert into customers_db table
SYNC UTIL
① invoke pipeline(config, archive_path, ticket_params)
Layer 2 — Application / Orchestration
app/run_pipeline_v2.py · app/service.py
returns RunResult(ok, inserted, errors, run_id)
run_pipeline_v2.py — Main Pipeline Coordinator (7 sequential stages)
a
Config & Logging
app_config.py
logging_setup.py
config/env.py
STARTUP
b
Inspect Archive
archives/plan.py
archives/timestamp.py
→ ArchivePlan
PLAN
c
Stream Members
archives/reader.py
NO full extract
spool limits
STREAM
d
Match Plugins
parse/registry.py
parse/targets.py
overlap contract
DISPATCH
e
Run Plugins
parse/engine.py
plugin.parse(stream)
parse/spool.py
PARSE
f
Route Records
parse/router.py
domain/records.py
normalize + validate
ROUTE
g
Persist + Dedup
DB/inserter.py
sha256 + ledger
sessions + models
COMMIT
⚡ Stream-Read Design
Archives are never fully extracted to disk.
reader.py yields file members as IO streams.
spool.py enforces byte limits per member.
Memory-safe for >5GB nested .tgz files.
🔒 SHA-256 Fingerprinting
Every archive is fingerprinted before pipeline runs.
fingerprint.py → archive_sha256 DB table.
Exact duplicate archives are skipped instantly.
Re-runs are safe — zero double-inserts.
📋 Plugin Ledger State Machine
Each plugin's result is tracked per-dump in plugin_ledger.
States: PENDING → RUNNING → DONE / FAILED.
Enables partial backfill: add a new plugin, re-run only that plugin against all existing dumps.
🛡️ Guardrail System
Same-day snapshot guards (6h window).
Overwrite purges child records atomically.
HC-ID overwrite policy matrix enforced.
DB unique constraints as final safety net.
② typed domain records + archive streams + ParseContext
Layer 3 — Domain
domain/
domain/records.py
Typed Python dataclasses for every record kind. Pure data — no I/O.
·NodeRecord
·LspRecord
·NeaBucketRecord
·MaNotifRecord
·AlarmRecord
·DprRecord / 3RRecord
domain/dto.py
DTOs crossing layer boundaries.
·RunConfig
·ArchiveMeta
·ParseContext
·InsertResult
Normalization
✓ version string normalization
✓ missing mgmt_ip fallback
✓ router_id consistency
✓ NE identity dedup key
Layer 4 — Infrastructure
infra/
📦4.1 Archive Handling
archives/plan.py
inspect_archive() → ArchivePlan. Determines which inner archive chain to use.
·chain / extra_chains detection
·ECS81 over non-ECS81 preference
·ACT-preferred over STDBY
·active_only_fallback logic
·depth-limited nested .tgz walk
archives/reader.py
Stream-first design — yields IO streams for each matched member, never writes to disk.
·iter_streams() lazy generator
·spool limits per-member
·path-aware basename matching
·Windows backslash path handling
·corrupt inner member recovery
·case-insensitive filename match
·multi-target single-pass read
archives/timestamp.py
Two-strategy timestamp inference — name hints win, log hints as fallback.
1.Name: datetime regex on filename
2.Log: first-line log timestamp scan
🔌4.2 Parsing & Plugin Framework
plugin_api.py
PluginBase ABC: name, targets[], priority, parse(stream) → List[Record]
registry.py
match_plugins(basename) → ordered list. Overlap contract enforced at registration.
engine.py
Drives each plugin over its stream. Collects ParseResult list.
router.py
Routes typed records to correct inserter method by record type discriminator.
targets.py · spool.py
File pattern registry. Stream spooling with configurable byte limit safety valve.
Low-level Parsers
parsers.py (base)
parsers_nea.py
parsers_ma.py
parsers_3r.py
Plugin Registry (12 registered plugins)
node_plugin.py
Parse Node state: version, mgmt_ip, router_id, NE identity
CLI_Show_Node.out
CORE
dpr_db_plugin.py
DPR database table extraction and structured record mapping
CLI_Show_DPR_DB.out
CORE
plugins/lsp.py
LSP entry ingestion with unique constraint (dump×NE×from×to×label)
CLI_Show_LSP.out
PLUGIN
plugins/nea_trace.py
NEA trace file parsing — extract trace events per timestamp
NEA trace files
PLUGIN
plugins/nea_bucket_counts.py
NEA bucket counter ingestion for capacity & utilization analysis
NEA bucket files
PLUGIN
plugins/ma_notif_bucket_counts.py
MA notification bucket counters — maps to ma_notif_bucket_count table
MA notif files
PLUGIN
plugins/alarm_state.py
Active alarm state snapshot ingestion per CaptureDump
alarm state output
PLUGIN
plugins/dpr_3r_plugin.py
3R group extraction — redundancy group membership per NE
DPR 3R output
PLUGIN
plugins/dpr_links.py
DPR inter-NE link mapping — source/dest/type per CaptureDump
DPR link file
PLUGIN
plugins/dpr_router_map.py
Router-ID to NE mapping table for cross-reference resolution
DPR router map
PLUGIN
infra/parse/alarm_state.py
Root-level alarm parser (legacy path, kept for backward compat)
alarm_state (root)
LEGACY
dummy_plugin.py ⚗️
Null plugin — test scaffold, used to verify registry & engine plumbing
testing only
TEST ONLY
⚙️4.3 Config + IO + Observability
infra/config/env.py
Loads .env / environment. Provides DB_URL, LOG_LEVEL, SPOOL_LIMIT, dry_run flag.
infra/app_config.py
AppConfig dataclass. Validated at startup. Controls overwrite, active_only, dry_run flags. Single source of truth for run behaviour.
io/run_cfg.py · legacy_cfg.py
Per-run config. Legacy shim for older CLI invocations — bridges old arg format to AppConfig.
io/customers_db.py
Customer lookup & upsert from DB. Powers db_suggest.py autocomplete. Handles Excel sync.
io/customers_sheet.py
Read Nokia customer records from Excel workbook for initial DB seeding.
io/local_store.py
Local filesystem cache / staging for downloaded or temporary archives.
io/fingerprint.py
Core of idempotency. Computes SHA-256 of archive before any parsing. Writes to archive_sha256 table. If hash seen → skip.
logging_setup.py · timeutils.py
Structured JSON logging to gmre-core.log. ISO timestamp normalization across log events.
③ transactional insert via ORM sessions — commit or rollback
Layer 5 — Data Layer: DB Access + Alembic Migrations + MariaDB
DB/ · db_migrations/ · alembic.ini
DB Access Layer
DB/sessions.py
SQLAlchemy engine factory + scoped session. Connection pool management. Proxy contract tested.
DB/models.py
All ORM table definitions. Declares relationships, unique constraints, cascade rules for purge-children.
DB/inserter.py
Transactional insert with:
• overwrite-purge-children logic
• same-day snapshot guard
• 6h window guard
• HC-ID overwrite matrix
• constraint retry on conflict
Alembic Migrations — 13 Versions (chronological)
Revision HashMigration Description
22925ffebaseline_schema — Network, NetworkElement, CaptureDump, Customer
99e812d1add_customer_table — dedicated Customer entity
2936a356add_lsp_entry_table — LSP records per CaptureDump
65bab2a4add_3r_group_tables — redundancy group membership
7ffc05d2add_nea_bucket_count — NEA utilization counters
8f225311add_plugin_ledger_and_archive_sha256 KEY — enables idempotency engine
10e362f0add_ingestion_ledger — per-dump ingestion state tracking
b59052c9add_alarm_state_entry — alarm snapshot table
d31a690cadd_ma_notif_bucket_count — MA notification counters
440dd4ffupdate_three_r_group_and_r_group_schema — schema refinement
fd71ca0aadd_router_id_and_ne_link — extend NE with router_id + link ref
d8ae72a0fix_lsp_unique_key_and_enforce_router_id KEY — data integrity enforcement
d76e01dedrop_old_ingestion_ledger — clean up deprecated table
MariaDB Schema — Logical Table Groups
Network
Customer
NetworkElement
CaptureDump
Core Entities
lsp_entry
unique: dump × NE
× from × to
× label
LSP Table
nea_bucket_count
ma_notif_
bucket_count
alarm_state_entry
Counter + State
three_r_group
three_r_link
dpr_router_map
nea_trace
DPR / 3R Tables
plugin_ledger
ingestion_ledger
archive_sha256
state machine
PENDING→DONE
Dedup / Ledger
Test Harness — Full Coverage (55+ test files · htmlcov tracked)
tests/
pytest.ini · conftest.py · htmlcov/
tests/unit/ — 29 files
test_reader_iter_streams test_reader_corrupt_inner_member test_reader_early_exit test_reader_respects_max_depth test_reader_windows_backslash_paths test_reader_case_insensitive test_reader_missing_target test_reader_multiple_targets_one_pass test_reader_lines_generator_lifetime test_registry_contract test_registry_match_order test_registry_multi_match test_registry_single_match test_plugin_meta_and_overlap_contract test_ts_from_name test_ts_hint_priority test_sha256_file test_dpr_links_parse test_dpr_router_map_parse test_ma_notif_bucket_counts_parse test_cli_modes_mode1_mode2 test_env_loader test_backfill_or_ingest_by_tree_deep test_backfill_run_calls_pipeline test_backfill_plugins test_ingest_ticket_folders_batch test_sessions_proxy_contract test_customers_sheet_read test_zero_cov_modules_import_smoke
tests/integration/ — 15 files
test_pipeline_smoke test_pipeline_db test_pipeline_uses_chain test_lsp_ingestion test_duplicate_sha_policy_integration test_sha_duplicate_behaviour test_sha_different_creates_new_dump test_hc_id_overwrite_policy test_same_day_new_snapshot test_same_day_different_origin_snapshot test_snapshot_6h_guard test_network_identity_idempotent test_network_identity_uniqueness test_backfill_plugins_integration test_customers_db_upsert
tests/db/ — 11 files
test_constraints test_constraints_links_and_3r test_guardrails test_plugin_ledger_state_machine test_plugin_db_write_guards test_overwrite_purge_children test_sha_duplicate_policy test_prepare_context_duplicate_policy_matrix test_multi_plugin_same_file_order_required test_schema_contract test_db_bootstrap
tests/fixtures/archives/ — 24 .tgz
example_small.tgz example_small_v2.tgz example_small_v2_with_lsp.tgz example_nested_chain.tgz example_nested_chain_act_240118.tgz example_deep_nested_depth8.tgz example_corrupt_inner.tgz example_active_only_fallback.tgz example_act_vs_stdby.tgz example_stdby_only.tgz example_ecs81_preferred.tgz example_case.tgz origin_a.tgz / origin_b.tgz
📄 tests/golden/golden_case_bharti_2025-01-18.json
📁 make_*.py — fixture generators (8 scripts)
Legend
data / record flow
config / control
domain records
plugin invocation
MariaDB table group
Top → bottom = data flow.
①②③ = pipeline stages.
⚡ 11-Step Data Flow
1User / script invokes CLI
2cli/main.py · scripts/*
3run_pipeline_v2.py
4fingerprint.py → sha256 check
5plan.py → ArchivePlan
6timestamp.py → capture_ts
7reader.py → stream members
8registry → engine → plugins
9router → domain records
10inserter → MariaDB commit
11logs + RunResult + exit code
⟳ Idempotency & Dedup
sha256 archive fingerprint
archive_sha256 DB table
plugin_ledger state machine
ingestion_ledger per-dump
unique DB constraints
re-runs always safe
🛡️ Guardrail System
overwrite_purge_children
same-day snapshot guard
6-hour window policy
HC-ID overwrite matrix
duplicate policy matrix
schema contract checks
⚠ Error Handling
corrupt inner archive recovery
missing target → skip + log
parse errors → partial result
DB constraint → rollback
multi-plugin ordering enforced
windows path encoding fix
🏭 Production Highlights
Stream-read, no disk extract
SHA-256 dedup engine
Plugin ledger state machine
Partial backfill support
13 tracked migrations
55+ test files
htmlcov coverage report
Python 3.9 & 3.12 dual compat
Dry-run mode (zero DB risk)
Atomic overwrite-purge-children
ECS81 / ACT preference logic
Depth-limited nested .tgz walk
GMRE_logparser · Production Architecture V2 · 490 files · 42 dirs · Python 3.9/3.12 · SQLAlchemy + Alembic + MariaDB · Designed by Priyanshu