Skip to content

Redistribute static + IPv6 routes#2279

Open
fzakfeld wants to merge 1 commit into
mainfrom
distribute-static-routes
Open

Redistribute static + IPv6 routes#2279
fzakfeld wants to merge 1 commit into
mainfrom
distribute-static-routes

Conversation

@fzakfeld
Copy link
Copy Markdown
Contributor

No description provided.

@fzakfeld fzakfeld requested a review from osfrickler May 18, 2026 16:35
@berendt berendt requested a review from ideaship May 18, 2026 16:36
@fzakfeld fzakfeld force-pushed the distribute-static-routes branch from 2d7241e to 60565e6 Compare May 18, 2026 16:36
Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • The _evpn_system_mac and _sag_gwmac values are only checked for truthiness/type; consider validating them against a proper MAC address format so bad config_context data fails fast and predictably.
  • In _add_vlan_configuration, SAG_GLOBAL['IP'] enables both IPv4 and IPv6 unconditionally; if a device only has anycast addresses for a single address family, you might want to derive which families to enable from the collected ipv4_anycast / ipv6_anycast lists instead of always enabling both.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The `_evpn_system_mac` and `_sag_gwmac` values are only checked for truthiness/type; consider validating them against a proper MAC address format so bad config_context data fails fast and predictably.
- In `_add_vlan_configuration`, `SAG_GLOBAL['IP']` enables both IPv4 and IPv6 unconditionally; if a device only has anycast addresses for a single address family, you might want to derive which families to enable from the collected `ipv4_anycast` / `ipv6_anycast` lists instead of always enabling both.

## Individual Comments

### Comment 1
<location path="osism/tasks/conductor/sonic/config_generator.py" line_range="2154" />
<code_context>
                 "min_links": pc_data["min_links"],
                 "mtu": pc_data["mtu"],
             }
+            if pc_data.get("evpn_lag") and evpn_system_mac:
+                pc_config["system_mac"] = evpn_system_mac
+            config["PORTCHANNEL"][pc_name] = pc_config
</code_context>
<issue_to_address>
**suggestion:** EVPN LAG tag without evpn_system_mac is silently ignored, which may be hard to debug.

Here we only set `system_mac` when `evpn_lag` is true *and* `evpn_system_mac` is valid. If a port channel is tagged as EVPN LAG but `evpn_system_mac` is missing or invalid, the tag has no effect and the `system_mac` is silently omitted. Consider logging or warning when `pc_data.get("evpn_lag")` is true but `evpn_system_mac` is falsy to surface misconfigurations.

Suggested implementation:

```python
            if pc_data.get("evpn_lag") and evpn_system_mac:
                pc_config["system_mac"] = evpn_system_mac
            elif pc_data.get("evpn_lag") and not evpn_system_mac:
                logger.warning(
                    "Port-channel '%s' is tagged as EVPN LAG but no valid evpn_system_mac is set; "
                    "EVPN LAG configuration will be ignored for this port-channel.",
                    pc_name,
                )
            config["PORTCHANNEL"][pc_name] = pc_config

```

1. Ensure that a `logger` object is available in this module. If it is not already defined, add at the top of `config_generator.py`:
   - `import logging`
   - `logger = logging.getLogger(__name__)`
2. If the project convention uses a different logger name (e.g. `LOG` or `log`), or a shared logger utility, adjust the `logger.warning(...)` call accordingly to match existing patterns.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread osism/tasks/conductor/sonic/config_generator.py Outdated
Signed-off-by: Freerk-Ole Zakfeld <[email protected]>
@fzakfeld fzakfeld force-pushed the distribute-static-routes branch from 60565e6 to bd1d549 Compare May 18, 2026 16:56
Copy link
Copy Markdown
Contributor

@ideaship ideaship left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving this change would be under the following assumption: entries whose key and value are fully derived from NetBox plus hardcoded SONiC policy are fully owned by the generator and should always be overwritten on regen.

Under that rule, the unconditional assignment at line 1989 (config["ROUTE_REDISTRIBUTE"][key] = {}) is correct. If the intent is that operators can layer route-maps onto generated VRFs, that is a feature request that requires changing the generator's ownership model — it is not a bug in the current code.

The reason for flagging this explicitly: another reviewer may dispute the assumption. generate_sonic_config() deep-copies /etc/sonic/config_db.json, and the PR expands the set of overwritten keys from one (connected|bgp|ipv4) to four, including the static variants where operator policy is most plausible. If the rule above is intentional and documented somewhere, a pointer here would close the question.

logger.info(f"Added ROUTE_REDISTRIBUTE {route_redistribute_key}")
for proto, af in [
("connected", "ipv4"),
("connected", "ipv6"),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: IPv6 redistribution entries have no corresponding IPv6 unicast AF

This loop emits {vrf}|connected|bgp|ipv6 and {vrf}|static|bgp|ipv6 into ROUTE_REDISTRIBUTE, but the BGP_GLOBALS_AF block above (lines 1952–1970) only creates {vrf}|ipv4_unicast and {vrf}|l2vpn_evpn for VNI-backed VRFs — it never creates {vrf}|ipv6_unicast. The IPv6 redistribution entries are therefore structurally orphaned for tenant VRFs: there is no IPv6 unicast AF to redistribute into. Either add a {vrf}|ipv6_unicast entry to BGP_GLOBALS_AF alongside the existing ipv4_unicast entry, or explain why the AF is not needed here.

@@ -1979,9 +1979,15 @@ def _add_vrf_configuration(config, vrf_info, netbox_interfaces):
# Add ROUTE_REDISTRIBUTE for VRF
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing test coverage for ROUTE_REDISTRIBUTE

There are currently no tests for emitted ROUTE_REDISTRIBUTE entries — _add_vrf_configuration() is only seen in the orchestrator test as a patch target, with no assertions on its output. This PR changes the emitted entry count from one to four, which is a behavioral change worth pinning. Please add a unit test (alongside the existing BGP globals tests in _config_generator_helpers.py) that calls _add_vrf_configuration() with a VNI-backed VRF fixture and asserts that all four expected keys are present in the resulting ROUTE_REDISTRIBUTE table.

for proto, af in [
("connected", "ipv4"),
("connected", "ipv6"),
("static", "ipv4"),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please confirm: all static routes from every VNI-backed VRF redistributed into BGP with no filtering

With the addition of ("static", "ipv4") and ("static", "ipv6"), this change redistributes every static route from every VNI-backed VRF into BGP unconditionally — no route-map, no prefix-list, no opt-out per VRF. Static routes often include defaults, management escape routes, or other entries that should not leave the local VRF. Is it intentional that there is no filtering mechanism here? If selective export is expected in the future, this is worth noting now so the interface can be designed for it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be best if we wait for #2173 which will introduce a way of installing default routes per VRF with default_route_ipv4 and default_route_ipv6. I only need redistribute static in VRFs where this is set anyways, so I think it makes total sense to change that.

Alternative would be a default-originate on the peering sessions to the servers, but this ignores the actual route's presence and would need to be passed to the other switches as well. I much prefer a redistribute static

@github-project-automation github-project-automation Bot moved this from Ready to In review in Human Board May 19, 2026
ideaship added a commit that referenced this pull request May 28, 2026
The SONiC config generator owns config_db.json in full: it generates the
file from NetBox data and hardcoded SONiC policy, and operators must not
make manual adjustments to it.  This ownership model was implicit and
undocumented, so each PR touching the generator relitigated which
sections it owns (surfaced in review of #2279 and #2290).

Document the model in the generate_sonic_config() docstring as whole-file
ownership: the generator overwrites every section it manages, and no
sections pass operator edits through unchanged.  Note the two fields the
generator does not itself emit and instead inherits from the device's
image-provided base config (the DEVICE_METADATA localhost device
attributes and the DATABASE schema VERSION); these come from the image,
not from operator hand-edits.

Bring the code into line with the documented model:

  - Rebuild owned tables from scratch instead of overwriting only the
    entries touched in the current run.  generate_sonic_config() deep-
    copies the base config and most helpers assign only the keys they
    currently generate, so an entry removed from NetBox but still present
    in config_db.json (e.g. an old VLAN_MEMBER, BGP_GLOBALS VRF, or
    VXLAN_TUNNEL_MAP) survived regen as stale, active config.  Drop the
    content of every owned table up front (OWNED_TABLE_KEYS: every
    generated table except the inherited DEVICE_METADATA and VERSIONS) so
    only regenerated entries remain.
  - BGP_GLOBALS['default'] was written with key-level updates that merged
    into any pre-existing entry, so operator fields survived regen.
    Replace the entry wholesale when a primary IP is present, like every
    other generated VRF.
  - STATIC_ROUTE merged the management default route into the dict loaded
    from config_db.json, so pre-existing routes (e.g. an operator's
    custom blackhole or VRF route) survived regen.  Clearing the owned
    tables now drops them before the management route is written.

Add tests pinning the model.  Illustrative helper tests
(test_config_generator_ownership.py) show that _add_vrf_configuration and
_add_vlan_configuration replace BGP_GLOBALS[vrf], VLAN entries and
ROUTE_REDISTRIBUTE keys wholesale.  Orchestrator tests
(test_config_generator_orchestrator.py) assert that pre-existing
BGP_GLOBALS['default'] fields, STATIC_ROUTE entries, and stale owned-table
entries are dropped on regen while DEVICE_METADATA and VERSIONS survive.

AI-assisted: Claude Code
Signed-off-by: Roger Luethi <[email protected]>
ideaship added a commit that referenced this pull request May 29, 2026
The SONiC config generator owns config_db.json in full: it generates the
file from NetBox data and hardcoded SONiC policy, and operators must not
make manual adjustments to it.  This ownership model was implicit and
undocumented, so each PR touching the generator relitigated which
sections it owns (surfaced in review of #2279 and #2290).

Document the model in the generate_sonic_config() docstring as whole-file
ownership: the generator overwrites every section it manages, and no
sections pass operator edits through unchanged.  Note the two fields the
generator does not itself emit and instead inherits from the device's
image-provided base config (the DEVICE_METADATA localhost device
attributes and the DATABASE schema VERSION); these come from the image,
not from operator hand-edits.

Bring the code into line with the documented model:

  - Rebuild owned tables from scratch instead of overwriting only the
    entries touched in the current run.  generate_sonic_config() deep-
    copies the base config and most helpers assign only the keys they
    currently generate, so an entry removed from NetBox but still present
    in config_db.json (e.g. an old VLAN_MEMBER, BGP_GLOBALS VRF, or
    VXLAN_TUNNEL_MAP) survived regen as stale, active config.  Drop the
    content of every owned table up front and only regenerated entries
    remain.  OWNED_TABLE_KEYS is composed from two named sub-categories:
    the scaffolded-owned tables (every scaffold key except the inherited
    DEVICE_METADATA and VERSIONS, created up front via setdefault) and
    the on-demand owned tables (ROUTE_REDISTRIBUTE, the SNMP_* tables and
    SYSLOG_SERVER, which helpers assign only when NetBox carries their
    data).
  - BGP_GLOBALS['default'] was written with key-level updates that merged
    into any pre-existing entry, so operator fields survived regen.
    Replace the entry wholesale when a primary IP is present, like every
    other generated VRF.
  - STATIC_ROUTE merged the management default route into the dict loaded
    from config_db.json, so pre-existing routes (e.g. an operator's
    custom blackhole or VRF route) survived regen.  Clearing the owned
    tables now drops them before the management route is written.

Add tests pinning the model.  Illustrative helper tests
(test_config_generator_ownership.py) show that _add_vrf_configuration and
_add_vlan_configuration replace BGP_GLOBALS[vrf], VLAN entries and
ROUTE_REDISTRIBUTE keys wholesale.  Orchestrator tests
(test_config_generator_orchestrator.py) assert that pre-existing
BGP_GLOBALS['default'] fields, STATIC_ROUTE entries, and stale owned-table
entries are dropped on regen while DEVICE_METADATA and VERSIONS survive.

The ownership tests also pin the table-classification taxonomy: that the
scaffold set partitions into scaffolded-owned and inherited tables, that
owned and inherited tables are disjoint, that OWNED_TABLE_KEYS has no
duplicates, and that the scaffolded-owned and on-demand owned sets do not
overlap.  Most of the taxonomy holds by construction (OWNED_TABLE_KEYS is
derived), so these checks exist to fail when the hand-written
INHERITED_TABLE_KEYS or ON_DEMAND_OWNED_TABLE_KEYS literals are edited in
a way that would silently break the model.

AI-assisted: Claude Code
Signed-off-by: Roger Luethi <[email protected]>
ideaship added a commit that referenced this pull request May 29, 2026
The SONiC config generator owns config_db.json in full: it generates the
file from NetBox data and hardcoded SONiC policy, and operators must not
make manual adjustments to it.  This ownership model was implicit and
undocumented, so each PR touching the generator relitigated which
sections it owns (surfaced in review of #2279 and #2290).

Document the model in the generate_sonic_config() docstring as whole-file
ownership: the generator overwrites every section it manages, and no
sections pass operator edits through unchanged.  Note the two fields the
generator does not itself emit and instead inherits from the device's
image-provided base config (the DEVICE_METADATA localhost device
attributes and the DATABASE schema VERSION); these come from the image,
not from operator hand-edits.

Bring the code into line with the documented model:

  - Rebuild owned tables from scratch instead of overwriting only the
    entries touched in the current run.  generate_sonic_config() deep-
    copies the base config and most helpers assign only the keys they
    currently generate, so an entry removed from NetBox but still present
    in config_db.json (e.g. an old VLAN_MEMBER, BGP_GLOBALS VRF, or
    VXLAN_TUNNEL_MAP) survived regen as stale, active config.  Drop the
    content of every owned table up front and only regenerated entries
    remain.  OWNED_TABLE_KEYS is composed from two named sub-categories:
    the scaffolded-owned tables (every scaffold key except the inherited
    DEVICE_METADATA and VERSIONS, created up front via setdefault) and
    the on-demand owned tables (ROUTE_REDISTRIBUTE, the SNMP_* tables and
    SYSLOG_SERVER, which helpers assign only when NetBox carries their
    data).
  - BGP_GLOBALS['default'] was written with key-level updates that merged
    into any pre-existing entry, so operator fields survived regen.
    Replace the entry wholesale when a primary IP is present, like every
    other generated VRF.
  - STATIC_ROUTE merged the management default route into the dict loaded
    from config_db.json, so pre-existing routes (e.g. an operator's
    custom blackhole or VRF route) survived regen.  Clearing the owned
    tables now drops them before the management route is written.

Add tests pinning the model.  Illustrative helper tests
(test_config_generator_ownership.py) show that _add_vrf_configuration and
_add_vlan_configuration replace BGP_GLOBALS[vrf], VLAN entries and
ROUTE_REDISTRIBUTE keys wholesale.  Orchestrator tests
(test_config_generator_orchestrator.py) assert that pre-existing
BGP_GLOBALS['default'] fields, STATIC_ROUTE entries, and stale owned-table
entries are dropped on regen while DEVICE_METADATA and VERSIONS survive.

The ownership tests also pin the table-classification taxonomy: that the
scaffold set partitions into scaffolded-owned and inherited tables, that
owned and inherited tables are disjoint, that OWNED_TABLE_KEYS has no
duplicates, and that the scaffolded-owned and on-demand owned sets do not
overlap.  Most of the taxonomy holds by construction (OWNED_TABLE_KEYS is
derived), so these checks exist to fail when the hand-written
INHERITED_TABLE_KEYS or ON_DEMAND_OWNED_TABLE_KEYS literals are edited in
a way that would silently break the model.

AI-assisted: Claude Code
Signed-off-by: Roger Luethi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In review

Development

Successfully merging this pull request may close these issues.

3 participants