Internet data consumption continues to increase per existing subscriber at about 50% per year, with a doubling rate of about 18 months. Both the median and average monthly downloaded volumes at the connection level are increasing at this 50% rate. The increase in the median reflects a broad adoption of higher bandwidth applications by the mainstream Internet users while the more rapid increase in the average is attributable to the heaviest users. The distribution in monthly download volumes is heavy tailed to the right, where the downloads of the top 1% of Internet users are an order of magnitude or multiple orders of magnitude greater than the median.
The development of these usage pattern distributions has significant capacity, pricing, and network management policy implications. Bandwidth and volume trends must be tracked to develop capacity dimensioning guidelines that meet these changing traffic demand traits. A decade ago, the only data available to measure Internet usage on a wide scale was in terms of bytes transferred per time period. This level of measurement is no longer sufficient to support the optimal design of networks. As use of the Internet evolves, the data collected about Internet traffic must evolve in parallel to ensure the performance of applications and to keep Internet access affordable.
Internet traffic demand increases are best explained by the following accelerators: rapid expansion of bandwidth intensive content, improved video quality, faster connection speeds, increase in variety of devices that access content from the Internet and economic factors. As content selection improves and customers are exposed to less expensive over-the-top video that can be both time and location shifted, customers are cutting the cord by cancelling their existing cable or satellite video services. Faster speeds and adaptive streaming now provide excellent quality video experience that is beginning to integrate with traditional broadcast television. Long-form commercial video content, not just short form user-generated content, can be displayed on large screen TV monitors via gaming consoles (Xbox,Wii,Playstation), smart phones and on other media devices such as Roku, AppleTV, iPad,Boxee, SlingBox, or Blu Ray Players. Internet service providers need to understand how the Internet is being used so that they may adapt their services to meet changing market demands. Measuring and forecasting based on generic units of usage is no longer adequate.
The Internet has become this revolutionary converged network providing phone and video chat service, video entertainment, online education, information, news and social networking. It carries applications historically carried by physically separate voice, cable television, and traditional web data networks. These applications have varying communication value and place a range of demands on then network from capacity and performance perspectives. Because different traffic types are no longer carried over distinct networks, there is a need to understand the composition of traffic in this common network.
To fully understand what is driving bandwidth demand requires collecting measurements with more information than bits physically transferred. For strategic planning and design purposes as well as new service creation, being able to characterize how customers are using and accessing the Internet is essential. Recent market reports state approximately 50% of all wireline Internet traffic terminates at devices over wi-fi (laptops, tablets, gaming consoles and smartPhones). Video traffic now accounts for over nearly 70% of Internet traffic downloaded. The percentage of subscribers using the Internet during traditional TV viewing hours has increased significantly in the past 2 years. All of these facts impact how an Internet service provider develops their products and design their network. Just 5 years ago, the main drivers of bandwidth demand were non-interactive applications such as P2P and Newsgroups downloads, but now the primary bandwidth contributor is video during TV viewing hours.
The boundaries in service delivery between applications, content, devices, OS and connectivity are blurring. It is incomplete to describe trends of basic physical characteristics such as data volumes without knowledge of their relationship to services in a converged network. When asked to present the top ten services on the Internet, what is meant by that? What is a service? Is a service defined by the name of the commercial provider (Netflix,Google, Facebook,iTunes) or by its type of service category (e-mail, P2P)? Does top mean top in popularity in terms of number or users or top in terms of bandwidth required to deliver the content? Answers to these depend on who is asking the question. A marketing department may care about the numbers of subscribers using an application, but network architects are concerned about the network capacity required to meet bandwidth. Regulators may be concerned with net neutrality policy issues. So, the answer on how to characterize Internet traffic depends on the audience.
The implication then is that granular data must be available for the flexibility to aggregate to answer questions associated with Internet application use. A service comprises elements that provide value, such as connectivity, content type, content provider and devices. There are a myriad of ways to display usage patterns for a service from the perspective of one of these elements from the same set of data. As traditional communications networks collapse to a common one, an understanding of the services carried by it is essential for the Internet provider. Understanding the traffic requires gathering and analyzing large volumes of data that was not fathomable or needed a decade ago.
Examples of simple and more complex characterization of Internet usage will be presented through different data visualization approaches. Physical statistics, such as daily or monthly usage patterns will be displayed along with a drill down to stats associated with metadata such as connection speeds or network topology. Creativity and software is required to extract the relevant information from large volumes of raw data. Examples of the process required to gather and organize data from original records to answer a strategic question will be provided through the presentation of case studies using R and Python Pandas.
Amie Elcan has worked in the telecommunications industry for over 20 years delivering traffic based assessments that drive optimal network architecture and engineering design decisions. She is currently a Principal Architect at CenturyLink in the Data Network Strategies organization. Her current areas of focus at CenturyLink are traffic modeling, application traffic analytics, and data science.
For exhibition and sponsorship opportunities, contact Susan Stewart at email@example.com
For information on trade opportunities with O'Reilly conferences email mediapartners
For media-related inquiries, contact Maureen Jennings at firstname.lastname@example.org
View a complete list of Strata + Hadoop World 2013 contacts