Topic > The External and Internal Characteristics of Big Data

Big Data resembles a flood of data. The abundance of data is expanding day by day. Big Data focuses on the enormous extent of data. The data can be in structured, unstructured and semi-structured form. Structured data consists of text files that can be displayed in rows and columns. It can be easily processed. Unstructured data is opposite to structured data. Data cannot be viewed in relational database. An example of unstructured data can be a word processing document, presentation, audio, video, email, and even many other business documents. The third category is semi-structured data which includes XML, JSON and NoSQL databases. Say no to plagiarism. Get a tailor-made essay on "Why Violent Video Games Shouldn't Be Banned"? Get an Original Essay The term big data is strongly linked to unstructured data. We can say that 80% of the data present in big data is unstructured. In reality, big data refers to data that is not managed by traditional databases. Traditional database system stores data in terms of Gigabytes, while in big data it stores data in petabytes, exabytes, zettabytes, etc. Companies need to retain or hire highly experienced staff for deep analytical insight into big data. The era of big data is continuously increasing in the most popular social sites such as Facebook and Twitter. Understanding Big Data will differ depending on business, technology and industry terms. McKinsey challenged five successive units where data grows rapidly. This includes healthcare, public sector, retail, manufacturing and personal location data. The main advantage of big data is scalability and data analysis. Examples of big data are found in real-life scenarios such as banking, social media, web data and any type of everyday transactions. Definition of big data complete with these five Vs: volume, variety, velocity, veracity and value. So, here are the 5 Vs of big data that are elaborated in simple language. Volume: In terms of big data the word “big” defines volume. in the future data will be in terms of zettabytes. A large amount of data is shared by social networking sites. Here are some interesting statistics showing the volume of data. According to real-time Internet statistics in 1 second there are: 64,551 Google searches 7,886 tweets on Twitter 822 Instagram photos uploaded in 1 second 72,179 YouTube videos viewed in 1 second 2,655,007 emails sent in 1 second including emails from spam 52,180 GB of Internet traffic in 1 second 2.5 million pieces of content shared by Facebook users 571 websites created every minute of the day Variety: As I discussed structured, semi-structured and unstructured data types. These types of data are difficult to manage with traditional database system. Various types of data are called manifolds. A lot of structured data is generated today.Velocity: The speed at which data is created, known as velocity. Some examples of data generated by social networking sites are tweets on Twitter, status/comments/shares on Facebook and many more. Data is generated in real time, near real time, hourly, daily, weekly, monthly and yearly, batch and so on. Veracity: the conformity of the data. Veracity attributes include the accuracy, integrity, and authenticity of the data. It leads to data uncertainty, whether the data is verified or not. Vagueness: Confusion about big data is called vagueness. There are several tools used to manage big data. Is it Hadoop, hive, Map reduce, Apache pig or any other? Here.