The challenges posed by big data have persisted for decades, continuously evolving alongside the ever-changing definitions of what constitutes “big” data. These challenges are compounded by market changes where major cloud service providers have shifted away from offering free and/or unlimited storage to customers. Recognizing the significance and broad impact of these issues on the Internet2 community and beyond, Internet2 has embarked on a new and cross-cutting initiative aimed at establishing a community working group. This group will assess R&E strategies to manage, curate, and access big data, aligning with the larger theme of Innovation and Transformation.
Some efforts to support enterprise and research storage needs are already underway. A Cloud Storage Working Group was created in March 2021 to deal with the technical and policy implications created by the ‘end of unlimited storage’ and the creation and sprawl of online course content during the COVID-19 pandemic. That group developed many resources, presentations, and their efforts ultimately culminated in Internet2 issuing an Request for Information (RFI) in early 2023 for cloud storage migration tools to help institutions move large data off of enterprise file sync and share solutions and other Software as a Service (SaaS) applications and to more cost-effective storage solutions.
To address these challenges comprehensively, Internet2 will leverage its existing ties to relevant communities and organizations while also establishing new connections where necessary. The primary objective is to support the formation of a working group tasked with conducting an assessment of the current state of the research data challenge. The working group will collaborate closely with various stakeholders to gather insights and perspectives. Subsequently, the group will deliver a comprehensive report to the Internet2 community, presenting its findings and recommendations.
This initiative represents a vital aspect of Internet2’s Research Engagement Program, which continually addresses the broader challenges of Research Computing and Data (RCD). The working group’s report will not only provide a detailed analysis of the research data challenge but also offer recommendations for Internet2’s future actions. It will identify potential areas for further exploration, highlight gaps in existing strategies, and suggest priority challenges that warrant attention. Moreover, the report will propose who or what organizations should take responsibility for addressing these challenges.
In addition to the report, the working group will curate and disseminate a range of resources, guidelines, and case studies. These materials will serve as valuable references for the Internet2 community and beyond, enabling researchers and educators to navigate the complexities of big data management effectively.
Given the widespread nature of discussions surrounding this topic, Internet2 acknowledges the multitude of ongoing conversations in various communities. Groups such as CaRCC, Champions, the Common Solutions Group (CSG), the Research Data Alliance (RDA), RDAP, ADSA, U.S. Research Software Engineer Association (US-RSE), EDUCAUSE, the Coalition for Networked Information (CNI), ACCESS, the Science Gateways Community Institute (SGCI), CI Compass, RRCoP, BDHubs, the Coalition for Academic Scientific Computation (CASC), the Eastern Regional Network (ERN), and more have been actively engaging in these discussions. Recognizing the expertise and involvement of libraries in data curation, Internet2 aims to foster stronger connections between the Internet2 community and their respective libraries.
As for the timeline, it will depend on the availability of resources and coordination efforts within Internet2. An Internet2 coordinator will be essential to facilitate the formation and functioning of the working group. Collaborative analysis of the research data ecosystem and identification of gaps will be crucial steps, followed by the production of the report.
While this initiative aligns with the community’s concerns, it is essential to note that the challenges of big data management are not entirely new. The primary outcome of this initiative is to provide reassurance to the individuals who raised the concerns, demonstrating that numerous groups are actively addressing these challenges. The working group will shed light on the complex nature of the issue and emphasize that it extends beyond a mere storage problem.
Despite the initiation of this particular effort, Internet2 will continue to engage with the RCD community and support ongoing projects like Nexus and CaRCC. The role of Internet2 in this space goes beyond the provision of tools; it encompasses a collaborative approach that involves network, identity, security, and cloud services.
In conclusion, the establishment of the Community Working Group to assess R&E strategies for managing big data is a significant step for Internet2. It represents a commitment to innovation and transformation within the research and education sector. By convening relevant stakeholders, analyzing the current state of the research data challenge, and delivering a comprehensive report with actionable recommendations, Internet2 aims to shape the future of research and education in an increasingly data-driven world.
[1]https://docs.google.com/spreadsheets/d/1EqtP_Rq2QXwKOJJ05LFxiHxS0RKy9UGNzlWHR4zWKaA/edit#gid=342295581]
Status: Internet2 senior management is exploring avenues to create such a group. Some of this work is ongoing in a NET+ cloud services context and is being tracked and advised by the NET+ Program Advisory Group (PAG), the Cloud Services, Technology and Architecture Committee. Relevant NET+ Service Advisory Boards are also directly and indirectly exploring complementary solutions. Nascent discussions have taken place regarding the possible creation of a standard research advisory group who could adopt this as an area to continually monitor and about which to make suggestions.