Wednesday, September 22, 2010

Sensor Networks Top Social Networks for Big Data

The article “Sensor Networks Top Social Networks for Big Data” discusses the growing problem of data storage. While social networking sites generate tons of data, the use of sensors in such things as planes or on roadways, are generating way more data than any social networking site, so much data that it is becoming a problem. Arising from this problem are two growing issues: Is there going to be an issue with the readability of such a large quantity of data, or in other words an algorithm to read the data, and in addition, who owns all of this data? As of right now, no one has an exact answer to these questions, but in the coming years this will be a developing problem.

An example of how much data is actually being collected can be seen through the sensors that are in the engine of Boeing’s 737 jets. For every 30 minutes of flight, 10 terabytes of data are generated from the planes engine sensors. According to the Library of Congresses website, they have collected 160 terabytes of data, so in other words, just 8 hours of flight would generate enough data to equal the amount of data stored in the Library of Congress.[1] The example of Boeing from the article is a very good way of realizing how many terabytes of data just Boeing can generate through engine sensors, but the article does not make the connection to the Library of Congress, which would have been a good way to provide a measuring stick to show how much data is actually being collected.

Another example of the increase in data being collected can be seen through Google. In 2006, they stored 850 terabytes of data[2], which may seem like a lot of data collected compared to the Library of Congress, but in 2008, it was reported that now Google was collecting 20,000 terabytes (or 20 petabytes) of data in a single day[3]. In just over 2 years, Google began collecting much more data, and now two years later it is hard to imagine how much more data they are collecting a day. So clearly, the problem of how to read mass amounts of data is a growing problem that needs to be continually addressed as data storage increases. In regards to the issue of who owns certain public data, one company is looking to capitalize. In addition to their own data, any other data that is not specifically owned by anyone, such as roadway sensor data, could be up for sale on an online marketplace before you know it. Microsoft is looking to create a marketplace where data can be bought and sold.

Essentially, data growth is becoming a problem, but it can also be used for many advantages. Energy, weather, or fuel data are all types of data that could be claimed and used to better products or predictions. It is a good thing that this problem is being addressed now, but in reality, the problem really is too far off into the future (at least 5 years according to the article) to truly understand the possibilities that may arise when data is so massive that it may not be able to be read, because in the next five years, a new algorithm may be developed and more storage may be possible, especially when one considers Moore’s law.

Article: http://www.businessweek.com/technology/content/sep2010/tc20100914_284956.htm


[1] http://www.loc.gov/webarchiving/faq.html

[2] http://googlesystem.blogspot.com/2006/09/how-much-data-does-google-store.html

[3] http://techcrunch.com/2008/01/09/google-processing-20000-terabytes-a-day-and-growing/

No comments: