Big Data Archive

HIVE : A Warehousing Tool

Hive is basically a Data Warehouse Infrastructure Tool, which is used for processing structured data in Hadoop. Primarily used to summarize and manage Big Data, Hive helps make querying and analyzing easy. Hive data warehouse software facilitates querying and managing large datasets residing

Let’s Understand Data Lake, Data Warehouse and Database

“Data lakes, data warehouses, and databases “–All these are some terminologies used in Data Management. But what exactly their meaning is and are the same or differ from each other, let’s try to explore in this article.

How to Build Big Data Analytics Infrastructure

ref https://www.datasciencecentral.com/profiles/blogs/big-data-analytics-infrastructure   Big data can bring huge benefits to businesses of all sizes. However, as with any business project, proper preparation and planning is essential, especially when it comes to infrastructure. Until recently it was hard for companies

Difference between AI, Machine Learning ,Deep Learning And Statistics

Artificial Intelligence, Machine learning, and Deep learning are frequently used terms in today’s scenario. Some also relate Statistics with AI, Machine learning and Deep learning.everyone wants to know how they are differ and related, here I am

How To Transform Current Job Role To Data Scientist /Machine learning /Full Stack

The demand for data scientists, Machine learning and Full Stack Developer is continues to growth, and more and more software engineers are working with software companies are switching there So what skills needed to switch the role

Big data , hadoop and spark

Big data is an important technology as it allows analyzing large amounts of information – both structured and unstructured – quickly. Map Reduce is a paradigm that allows computation to be done in distributed and parallel manner.

What is Apache Spark?

Apache Spark is an emerging platform that has more flexibility than MapReduce but more structure than a basic message passing interface. It relies on the concept of distributed data structures (what it calls RDDs) and operators. Because Spark

Hadoop – An Open Source FrameWork

Hadoop is a MapReduce framework that enables processing large datasets in parallel, on clusters of commodity hardware. This is cheaper, as it’s a open source solution that can run on commodity hardware while handling petabytes of data. It’s

Introduction To MongoDB

MongoDB is an open source, cross-platform, and the most popular NoSQL database program. Database,collections and documents are terminology in mongodb. Each database has collections which in turn has documents. The data stored is in the form of

What is MySQL and NoSQL

MySQL – A relational database MySQL is relational database written in C and C++. It is easy to use and open source.Along with PHP, it can be used to create dynamic server-side applications. It supports many operating