Speakers:
- Jeronimo Bezarra, Florida International University
- Renata Frez, Florida International University
- Italo Valcy, Florida International University
- Tiago Monsores, RedCLARA
Slides:
Abstract:
Talk 1: Detecting microbursts is an ongoing challenge for any research and education networks (REN) and commercial internet service providers. Microbursts are sporadic bursts of traffic that occur in very short timescales (hundreds of milliseconds) that most times go undetected by conventional network monitoring tools. They impact data transfers and cause costly performance problems in long-haul and regional networks. The topic of detecting microbursts has become increasingly hotter with the availability of programmable network devices with the Intel Tofino ASIC and P4.
However, most solutions leverage the resources in programmable network devices by adding extra stages to the forwarding pipeline, which is not always possible in RENs. RENs, such as Internet2, ESnet, and AmLight, must support a variety of network protocols and functions and can’t always be customized to do microburst detection directly in the forwarding plane. For most network operators, network telemetry solutions and microburst detection are performed out-of-band, leveraging technologies such as In-band Network Telemetry (INT) and traffic mirroring. INT enables network operators to detect microbursts by measuring bandwidth utilization in sub-second intervals.
In previous Internet2 Technology Exchange conferences, AmLight experience with instrumenting its production long-haul research and education network with INT was described, including its open-source solution named AmLight INT Collector. During INDIS 2021, with INT enabled, AmLight demonstrated how microbursts are being monitored by measuring bandwidth utilization using intervals of 100 ms to 500 ms. However, during our operation, we learned that even a short interval, such as 100 ms, is not enough to detect some microbursts observed. AmLight needed a shorter bandwidth utilization interval to be able to detect dual-digit microbursts.
Lowering the bandwidth utilization measurement interval even lower than 100 ms incurs multiple tradeoffs: (1) Storing massive amounts of data for analysis, even though microbursts are sporadic events; (2) Not detecting even shorter timescale microbursts, because of predefined fixed bandwidth utilization measurement intervals; (3) Resulting performance issues, because of increasing CPU and I/O usage and disk space, possibly leading to loss of accuracy; and (4) Risking impacting network troubleshooting activities due to the delay caused when plotting graphs with tens of thousands of measurement points.
Based on our experience managing the tradeoff between storing granular counters and the bandwidth utilization measurement interval, we believe that an efficient solution should be capable of evaluating the need for storing counters, versus storing, then later deleting them. This is the objective of our adaptive and efficient solution: to collect and process network counters every few milliseconds, but only store them when there is a clear indication of a microburst. Counters should be evaluated against a set of operator-defined metrics. For instance, the minimum and maximum bandwidth utilization measurement intervals for data gathering, and thresholds to detect a microburst by measuring traffic increases since the last data gathering. Upon microburst detection, adapt the interval between two consecutive data gathering operations based on historical measurements. The AmLight INT collector was enhanced to detect microbursts as small as 20 ms while preserving disk space and CPU cycles. Table below shows an example of 13 microbursts detected, some as short as 20 ms and multiple Gbps.
Talk 2: Network visibility has always been one of the most requested network features by RedCLARA users. The idea to be aware of everything moving within and through the network should define the new normal. Network visibility makes it possible to:
• Understand where NREN’s (National Research and Education Network) data is and how it is used
• Identify where network traffic is coming from and going to
• Determine what user behavior is normal and abnormal
• Know what software is in use on the network
• Locate vulnerabilities or misconfigurations on the network
• Proactively detect network outages and performance issues
Network visibility covers a lot of ground, but its definition is actually rather simple. The term refers to being aware of everything within and moving through the network with the help of network visibility tools. In this way, network visibility tools are used to keep a close and constant eye on network traffic, monitored applications, network performance, managed network resources and big data analytics, which in turn, requires effective and scalable data collection, aggregation, distribution and delivery.
Network visibility, however, is not a passive function as it allows you to exert greater control over all these aspects. The more in-depth, proactive and extensive your network visibility, the more control you have over your network data, and the better you can make decisions regarding the flow and protection of that data.
With this concept in mind, RedCLARA Network Engineering Group (NEG) and Systems Engineering Group (SEG) have built the Integrated Monitoring Portal (IMP). Its goal is to become the best network monitoring and visualization tool for Latin American NRENs, where users can visualize and analyze network traffic on each of the configured VRFs, network latency, packet errors, packet discards, BGP state and uptime, service availability, number of BGP accepted and denied prefixes, network alerts among many others.
Also, NRENs can visualize information about the network flows and therefore determine the most used IP addresses, autonomous systems (AS), services, protocols, detect network threats, traffic origin and destination, geolocation information and so forth.
In addition to that, users can also visualize important information about RedCLARA’s backbone. A Network WeatherMap has been included and also each of the international backbone circuits can be monitored using IMP. This brings valuable information to network operators which demand real time data for troubleshooting network issues and evaluating network performance. It is also integrated to IMP the information about eduroam for each of RedCLARA’s associates.
For increased visibility, it was also added to the platform an SSH Tool which allows network operators to perform ping and traceroute from any router of the network in addition to an extensive subset of show commands which allows the visualization of interface status and the operation of a diverse stack of protocols such as IPv4, IPv6, ARP, BGP, IS-IS, MPLS, L2VPN, L3VPN, IGMP, PIM, MSDP, etc.
In recent years RedCLARA NEG has put an immense effort to study, evaluate and implement the most modern network monitoring, administration and visualization tools and the Integrated Monitoring Portal can be seen as the consolidation of all this work.
As the TechEX23 audience is comprised of Research and Education network operators, network administrators and network users, we expected this presentation will be interesting to everyone attending this conference.