Super Women in Supercomputing: Robin Goldstone (Lawrence Livermore National Laboratory) and Jini Ramprakash (Argonne Leadership Computing Facility) talk about next-gen supercomputing, and how their teams are using open source and continuous integration.
Cisco & Princeton Host First North American ODL /dev/boot
Last week at Princeton University, forty of the industry’s top SDN developers, network architects, and students came together to lift the game of open SDN co-development in the first North American ODL /dev/boot.
Participants from AT&T, Comcast, Verizon, and Cisco joined forces with students and professors from Boston University, Columbia, Cornell, Princeton, and Rutgers to dive deep into ODL architecture, use cases, and code. “We all brought our best coders and architects, our brightest ideas, and our biggest problems; and we tackled them together,” says Kristen Wright, Director, Cisco Research & Open Innovation, “Couple that with some of academia’s brightest young minds—and magic happens.”
The /dev/boot action learning event included architecture deep-dives, training, labs, and design sessions; and culminated in a 2.5-day “hack-a-thon” in which participants teamed up to co-develop ODL features, infrastructure modifications, and applications that were deemed top business priorities. “Only in open source can users become an integral part of their own feature development. All of these projects produced usable and useful code,” says Jan Medved, Distinguished Engineer in Cisco’s Chief Technology & Architecture Office.
Cisco expects several of the projects from /dev/boot to land in the open source community.
Cisco has held other /dev/boot events in Beijing, and at HackZurich; and will continue to shepherd this new Open Innovation trend around the world. Interested in /dev/boot? Contact email@example.com. Also check out our GitHub repository for a closer look at this cohort’s /dev/boot projects.
(L to R) Kevin Boutarel, Mike Kowal, Yueping Zhang, Oleg Berzin (not pictured)
SDN REMOTE BLACK HOLE (“Flow-Spec” SDN-Programmatic Flow Management)
FIRST PLACE WINNER
The first place winners were Kevin Boutarel (Princeton University), Oleg Berzin (Verizon), Mike Kowal (Cisco), Yueping Zhang (Verizon) for their demonstration of ODL capabilities for programmatic control of flows in the network using BGP flow-spec or BGP v4. Check it out on GitHub.
NB CLUSTER REDIRECT
INFRASTRUCTURE INNOVATION AWARD
Dave Stilwell (AT&T) and Dan Timoney (AT&T) were honored with the Infrastructure Innovation award for providing the ability to redirect requests in a cluster to the shard leader. Check it out on GitHub.
(L to R) Dave Stilwell & Dan Timoney
(L to R) Stas Isakov, Mark Barrasso, Larry Zhou
SNOW PLOW IoT
Mark Barrasso (Boston University), Stas Isakov (Cisco), and Larry Zhou (AT&T) took home the Entrepreneurship award for their innovative, user-friendly, and utilitarian IoT application for realtime tracking of snow plows and weather updates. Check it out on GitHub.
DESIGN & CODE AWARD
Tuan Duong (AT&T), Jill Jermyn (Columbia University), and Suja Srinivasan (Rutgers University) won the Design & Code award for their creative and well-thought-out modification of the method for attaching BGP community values to routes. Check it out on GitHub.
(L to R) Jill Jermyn, Suja Srinivasan, Tuan Duong
FULL PROJECT LIST
(Alphabetical by Project Name)
SDN-controlled method for attaching community values to routes
Traditional Method: Configure prefix list and match policies to set Community Values to Routes. Problem: Need different policy configuration on every router.
This Experiment: Divert the Route Reflection Path to pass through SDN Controller. SDN Controller attaches the Community Values to Routes.
odl-kafka-plugin is a northbound plugin that allows real-time (or near-real-time) event or telemetry data to stream into a big-data platform such as PaNDA. The key design goal of this plugin is to provide a generic and configurable data connector that subscribes to southbound event source(s) via ODL's Event Topic Broker (ETB) on one side, and forward notifications to a Kafka endpoint.
Craig Riecke (Cornell), Lee Sattler (Verizon), Wuyang Zhang (Rutgers)
October 14-16 marked the annual Grace Hopper Celebration of Women in Computing. Held in Houston, Texas, it was packed with cool sessions and strong women who are passionate, witty, and experts in their fields.
Among those women is a small contingent of hard-core supercomputing experts from our national laboratories. One of their themes this week has been open source, and they had some really interesting things to say!
Robin Goldstone—a computer scientist from the Lawrence Livermore National Laboratory (LLNL)—talked about the evolution of supercomputing at LLNL, and about their plans to build a next-gen 150 petaFLOP HPC cluster—Sierra. (That’s 150x10^15 floating point operations per second.) Sierra is what LLNL calls an “Advanced Technology System” (ATS)—a vendor-supported supercomputer designed to run a single, very large problem.
Robin also discussed some of the challenges of taking HPC ATS systems to the next level, which is exascale computing. Power will be one major challenge. “We could build an exascale system today, but it would consume around 100 megawatts of power.” Lawrence Livermore, along with the other national laboratories, is working closely with computer vendors to improve the power efficiency of future HPC systems. The integration of accelerators, such as GPUs, has also been difficult to date due to their segregation from rest of the compute architecture, Robin stated, but that Sierra will utilize some of the new GPU advances from NVIDIA, in which the accelerators are much more tightly coupled with the CPU. IBM and NVIDIA will be LLNL’s development partners for this next-gen ATS system.
Robin and team also manage “Capacity Computing Systems.” These are open source clusters that run the majority of the LLNL workloads, and are running the LLNL homegrown HPC software stack, CHAOS (Clustered High Availability Operating System). CHAOS has given the LLNL sysadmins a way to manage multiple Linux clusters on mostly-commodity compute with efficiency, reliability, and speed. In short, LLNL developers have enhanced Red Hat Enterprise Linux to better support HPC. They have integrated the Lustre open source high-performance file system and InfiniBand interconnect, created new packages for cluster management & monitoring, and they regularly contribute their kernel patches and other HPC software back to the open source community. They also developed SLURM (Simple Linux Utility for Resource Management) for fault-tolerant, highly scalable resource management and job scheduling.
Continuous Integration (CI), as a named practice, originated with the work of Grady Booch in the early 90s, and has largely been adopted and evolved out of the Agile and Extreme Programming methodologies of the past 25 years or so. Agile CI in supercomputing may sound like an oxymoron, but Sreeranjani (Jini) Ramprakash—UX specialist / team lead from the Argonne Leadership Computing Facility (ALCF)—dispelled that myth for us this week.
Jini and a few colleagues from other ALCF groups found they were able to form a natural bridge between their teams to bring together several crosscutting projects. They all have “day jobs,” but have organically grown an awesome CI subculture, which is helping their groups to be more efficient, effective, and connected.
It all started with a small group adopting Jenkins—a cross-platform, continuous integration and continuous delivery application—for their business intelligence needs, including setting up an Extract-Transform-Load (ETL) solution. Word started spreading that developers were able to do more of “the fun stuff” because they were spending less time debugging, trying to keep the code base salient, and server-hopping to diagnose problems from an endless number of log files. Instead, they were pipelining their build and test processes, automating test & deployment, and just generally making the build-test-deploy cycle easier for developers.
What happened next was interesting. Developers on other projects began asking for “exceptions” to use the externally hosted CI tools for other projects and purposes. Word spread, and the ALCF management team decided that the exception should be the rule. Hosting their own Jenkins instance would allow CI to expand organically, and would help them avoid some of the authorization and access backlogs they were facing due to the large number of mounting exceptions—a win-win.
Once the ALCF launched their in-house Jenkins instance, CI spread like wildfire within the facility. As any good engineer would do, Jini began trying to quantify the benefits. What she discovered was that, by participating in this internal open CI community, ALCF developers were helping each other more, projects were more connected, and CI adopters were reporting extreme productivity gains.
Jini attributed the dramatic interest in CI not only to productivity benefits, but also to the developer experience. “It was so much easier for developers. They just started using it more and more,” she said. “An unpredicted benefit was the degree to which it has helped our teams to slice through organizational silos with cross functional teams. They just naturally work together.”
About Robin & Jini
As a member of the Advanced Technologies Office within the Livermore Computing supercomputer center, Robin is involved in developing LLNL's High Performance Computing (HPC) strategy. Focus areas include commodity HPC clusters, architectures for data intensive computing, energy efficiency and Exascale systems.
Sreeranjani (Jini) Ramprakash is the User Experience Specialist/Team
Lead at Argonne National Laboratory's Leadership Computing Facility,
which operates a supercomputer in support of high-impact scientific
She leads a team of analysts supporting researchers worldwide. She also
helped develop business intelligence systems by modeling data and
building software to streamline reporting.
Passionate about engaging girls in STEM activities, she volunteers for
Systers, mentors for Google Summer of Code and helps organize Introduce
a Girl to Engineering Day at Argonne.
She has a master’s degree from UT Arlington and a bachelor’s degree from
Mangalore University, both in Computer Science and Engineering.
Other Grace Hopper Women in Supercomputing
Other great HPC and supercomputing sessions included Oak Ridge’s Veronica Vergara Larrea talking about Jenkins monitoring, and Suzanne Parete-Koon’s “Tiny Titans” talk on Parallel Computing with DIY Supercomputers & Games.
Gennette Gill, from D.E. Shaw Research, also talked about the Anton Management System, which her team developed in order to address the unique UI and management challenges in a multi-user supercomputing environment.
I was delighted to see this much female supercomputing power at Grace Hopper, and so enjoyed meeting these super-women. I hope their talks have inspired some of the young talent from Grace Hopper to pursue careers in high-performance computing.
About Cisco and HPC
Cisco’s UCS platform is extremely well-suited for what we call “balanced technical computing”—that is, classes of problems which gain the most advantage from a balanced approach to CPU, memory, and I/O “power.” A few examples of this are elliptic curve cryptography, bio-informatics, financial services, data warehousing/mining/search engines, and cloud computing.
The Cisco UCS team has been piloting a production-class balanced technical Computation-as-a-Service (CaaS) offering that has provided many of Cisco’s researchers a high-availability platform on which to solve some pretty interesting real-world problems. (See the research & pub list below)
Cisco’s Arcetri cluster—the brainchild of Claudio DeSanti (Cisco Fellow) and Landon Curt Noll (Claudio’s “Resident Astronomer”)—was named after the world-renowned Arcetri Astrophysical Observatory in the outskirts of Florence, Italy. The cluster’s beauty lies within its heterogeneity (meaning it can leverage a variety of UCS gear – old, repurposed, and new), its balanced computing performance, low latency/high throughput/predictability, and its ultra-high manageability using Cisco’s powerful UCS Manager toolkit and APIs. Claudio’s team has also incorporated an adaptive computing Moab/TORQUE resource scheduler, and Charm++ computation via Ultra Low Latency Ethernet (usNIC).
Research Impact – List of publications (Year 1)
[C1] Y. M. Kang, A. Arbabi, and L. L. Goddard, “Resolving split resonant modes in microrings,” IEEE Photonics Conference, ThP5,
[C2] M. F. Xue and J. M. Jin, “A hybrid nonconformal FETI/conformal FETI-DP method for arbitrary nonoverlapping domain
decomposition modeling,” IEEE AP-S Int. Symp., Orlando, FL, Jul. 2013.
[C3] M. F. Xue and J. M. Jin, “A two-level nested FETI/FETI-DP domain decomposition method,” IEEE USNC-URSI Int. Symp.,
Orlando, FL, Jul. 2013.
Research Impact – List of publications (Year 2)
[J1] M. F. Xue and J. M. Jin, “Application of an oblique absorbing boundary condition in the finite element simulation of phasedarray
antennas,” Microwave Opt. Technol. Lett. 56, pp. 178-184, doi: 10.1002/mop.28075 (Nov. 2013).
[J2] Y. M. Kang, M.-F. Xue, A. Arbabi, J.-M. Jin, and L. L. Goddard, “Modal expansion approach for accurately computing resonant
modes in a high Q optical resonator,” Microw. Opt. Technol. Lett. 56, 278-284, doi: 10.1002/mop.28035 (Dec 2013).
[J3] M. F. Xue, Y. M. Kang, A. Arbabi, S. J. McKeown, L. L. Goddard, and J. M. Jin, “Fast and accurate finite element analysis of
large-scale three-dimensional photonic devices with a robust domain decomposition method,” Opt. Express 22, no. 4, pp. 4437-4452,
doi: 10.1364/OE.22.004437 (Feb. 2014).
[C4] M. F. Xue and J. M. Jin, “Nonconformal FETI-DP method combined with second-order transmission condition for large-scale
electromagnetic analysis,” 30th International Review of Progress in Applied Computational Electromagnetics, Jacksonville, FL, Mar.
Internet Data Analytics
Internet Data Analytics allows researchers and engineers to focus on analytics by making data collection and presentation simple and efficient. Internet Data Analytics brings several emerging IP technologies together on a modern software development platform, with rich APIs that open the doors to endless applications for operations, design, engineering, and research.
Why We Built Internet Data Analytics
Routers use Border Gateway Protocol (BGP) to exchange network reachability information. This information has been the subject of academic research since the beginning of the Internet.
Historically, researchers have collected BGP data by directly peering from BGP routers (or vRouters) in their labs to routers participating in the Internet BGP routing exchange. This approach is less than ideal, because it requires manual configuration, and because access to data is typically limited to ASCII or MRT-formatted text files, which can become cumbersome as data sets grow.
A more fundamental challenge with historical collection methods is that using a one-to-one collection method (one BGP session per routing data source) makes it difficult to increase the number of collection points without adding significant management burden to network operators.
Internet Data Analytics reduces these operational challenges significantly by efficiently collecting BGP & IGP data from multiple sources, in a one-to-many fashion, without requiring BGP peering. This allows researchers to focus more energy on the data, itself, and less on its collection and interpretation.
Internet Data Analytics in a Nutshell
Internet Data Analytics is built upon the BGP Monitoring Protocol (BMP), in which BGP packets are streamed from BMP-enabled routers in the network to a server over a TCP session.
Collecting the Data
Internet Data Analytics data collection is done by a lightweight, open source collector daemon from OpenBMP (openbmp.org). OpenBMP receives, parses and stores BGP packets with near-real-time performance, and stores the data in a transactional database with flexible reporting options.
Accessing the Data
Users can access the data directly through a set of REST APIs (using JSON) or through a RESTCONF plugin.
We are also showcasing a prototype of the RAD Analysis & Visualization UI (RAD-AV) at Cisco Live! San Diego this week. This is a powerful analytics engine and user interface for enhanced visualization of Internet Data Analytics data.
Internet Data Analytics opens the doors for researchers to create exciting new applications without the burden of BGP data collection.
About the PIs
Serpil Bayraktar is a Principal Engineer in Chief Technology and Architecture Office at Cisco. She is responsible for advancing IP Routing Technologies and creating a new Routing Analytics framework. Serpil has more than 24 years of experience in networking industry and holds a B.S. in EE from Istanbul Technical University.
Declarative and Expressive Approach to Control Forwarding Paths in Carrier-Grade Networks
Renaud Hartert (UCLouvain), Stefano Vissicchio (UCLouvain), Pierre Schaus (UCLouvain), Olivier Bonaventure (UCLouvain), Clarence Filsfils (Cisco Systems Inc), Thomas Telkamp (Cisco Systems Inc), Pierre Francois (IMDEA Networks Institute)
SDN simplifies network management by relying on declarativity (high-level interface) and expressiveness (network flexibility). We propose a solution to support those features while preserving high robustness and scalability as needed in carrier-grade networks. Our solution is based on (i) a two-layer architecture separating connectivity and optimization tasks; and (ii) a centralized optimizer called DEFO, which translates high-level goals expressed almost in natural language into compliant network configurations. Our evaluation on real and synthetic topologies shows that DEFO improves the state of the art by (i) achieving better trade-offs for classic goals covered by previous works, (ii) supporting a larger set of goals (refined traffic engineering and service chaining), and (iii) optimizing large ISP networks in few seconds. We also quantify the gains of our implementation, running Segment Routing on top of IS-IS, over possible alternatives (RSVP-TE and OpenFlow).
Large-scale measurements of wireless network behavior
Sanjit Biswas (Cisco Meraki), John Bicket (Cisco Meraki), Edmund Wong (Cisco Meraki), Raluca Musaloiu-E (Cisco Meraki), Apurv Bhartia (Cisco Meraki), Dan Aguayo (Cisco Meraki)
Meraki is a cloud-based network management system which provides centralized configuration, monitoring, and network troubleshooting tools across hundreds of thousands of sites worldwide. As part of its architecture, the Meraki system has built a database of time-series measurements of wireless link, client, and application behavior for monitoring and debugging purposes. This paper studies an anonymized subset of measurements, containing data from approximately ten thousand radio access points, tens of thousands of links, and 5.6 million clients from one-week periods in January 2014 and January 2015 to provide a deeper understanding of real world network behavior. This paper observes the following phenomena: wireless network usage continues to grow quickly, driven most by growth in the number of devices connecting to each network. Intermediate link delivery rates are common indoors across a wide range of deployment environments. Typical access points share spectrum with dozens of nearby networks, but the presence of a network on a channel does not predict channel utilization. Most access points see 2.4 GHz channel utilization of 20% or more, with the top decile seeing greater than 50%, and the majority of the channel use contains decodable 802.11 headers.
Cisco uses CyberGrants to manage and review all research proposals. CyberGrants will be upgrading its user interface, which will be launching August 14th. This means that there will be some downtime starting on 8/7 EST COB through 8/14. We are not enforcing submission deadlines at this time, so proposals will be processed before and after this period. In the mean time, please send proposals and any questions you may have to firstname.lastname@example.org