How BGP Graceful restart and stale routes are processed in Versa Operating system:
Purpose:
This Article explains the BGP graceful restart behavior and how it processes the Active and Stale routes in VOS.
Table of Contents:
Case1: Working with BGP graceful restart functionality.
Case2: Effects of aggressive BGP flap on B2B connectivity.
Case3: Route selection for Vpnv4 and versa-private address family.
Case1:BGP Graceful Restart:
We configure the Graceful restart(GR) under BGP neighbors by default which is advertised as a BGP GR capability between BGP speaking nodes(Branch & Controllers). The intention behind having GR is to implement headless operation so that branches can communicate without any interruption for a given time frame. At any given point if controller connectivity gets lost or controller nodes itself goes down, Branch routers will keep the forwarding state maintained and traffic flow between branch devices won't get affected until the GR stale path timer gets expired.
We achieve this by having the graceful-restart capability in place where we mark the previously learned route as stale and keep them in our routing/forwarding table when there is any interruption in BGP connection with controllers.
Following command can be obtained to see if BGP graceful restart is triggered or not. Once BGP connection goes down, GR stale path timer start counting down. by default stale path timer is set to 28800(Approx 8 hours). Meanwhile all the routes received previously will be marked stale and traffic will be forwarded as it is. In case BGP connection doesn't get restored before GR stale path counts down to zero then all the stale routes will be deleted.
admin@cpe1-cli> show bgp graceful-restart brief
routing-instance: Tenant-1-Control-VR
Neighbor GR-Cap StaleNlri GR-Time/Stale-Path-Time
10.0.0.2 Recvd -- 28600 <<<<<<<<<<<
10.0.0.6 Recvd -- --
Routing entry for 172.16.20.0/24
Peer Address : 10.0.0.2
Route Distinguisher: 12L:102
Next-hop : 10.0.0.10
VPN Label : 24704
Local Preference : 110
AS Path : N/A
Origin : Igp
MED : 0
Community : 8009:8009 8015:0
Extended community : target:12L:12
Preference : Default
Weight : 0
Graceful Restart : Stale <<<<<<
Case-2:Effects of Aggressive BGP Flap:
When BGP connection goes down device retain all the previously learned route and mark them as stale and forwarding continues. We have come across issues when BGP flaps have occurred between controller and branch devices and eventually B2B traffic was impacted. In such cases it is expected that Branch to branch traffic should not get impacted as we have GR implementation in place. However, It is important to note that BGP graceful restart is only helpful to retain the stale routes for one BGP flap.
When BGP connection gets restored, it waits for defer time to get completed and then run the bests path selection and advertises the new prefixes to peer devices. This ensure that the BGP node gets enough time to sync the routes from all other peers and it forwards the best route further. Meanwhile branches still hold the stale route and continue forwarding the traffic. BGP Defer time is by default set to 240 seconds.
It is important to note that if consecutive BGP flap happens within the Defer time then branch devices won’t get new route advertisement as controller is waiting to complete the defer time. Now Branch devices has experienced second BGP flap without getting new routes, it will delete the existing stale routes from the box. We are expected to see branch to branch tunnels going down If we face multiple BGP flaps in small time frame.
graceful-restart {
enable;
maximum-restart-time 3600;
recovery-time 3600;
select-defer-time 30;
stalepath-time 3600;
multiplier 8;
}
You can get the route history detail from the branch device to see the route Add/Delete activity on the box.
vsm-vcsn0> show vunet route history | grep 172.16.10.0
172.16.10.0/24 10.0.0.8 op=del kop=RTM_DELETE err=0 rtt=11 flags=0x2 ifindex=1058 n_out_labels=0 label=0 next-fib=1024, Fri Sep 30 11:03:03 2022
172.16.10.0/24 10.0.0.8 op=add kop=RTM_ADD err=0 rtt=11 flags=0x3 ifindex=1058 n_out_labels=1 label=24704 next-fib=1024, Fri Sep 30 10:41:40 2022
Case-3:Route selection for Vpnv4 and versa-private address family:
1> Branch Advertise the L3vpn prefixes to controllers and it reaches the remote branches using VPNv4 address family. if any point of time controller loses connectivity with the branch then all the received routes are marked stale however they will still be advertised to remote peers as it is. BGP best path selection does not treat stale route as less preferred.
for example, If BGP connection goes down with only one controller and secondary controller stay active, in such cases branch device will not treat the stale route less preferred from controller 1.
2> There are some SDwan updates communicated over versa-private address family between branch devices.
For example: Branch devices does IPsec Rekey as per configured rekey timer and communicate it via BGP update using versa-private family. BGP GR behavior for such route updates differ from normal L3vpn prefix updates. In this case when branch loses BGP connection with one controller and stay up with secondary controller, branch device will prefer the BGP update from secondary controller over stale routes from controller 1.