HIVE : A Warehousing Tool

Hive is basically a Data Warehouse Infrastructure Tool, which is used for processing structured data in Hadoop. Primarily used to summarize and manage Big Data, Hive helps make querying and analyzing easy. Hive data warehouse software facilitates querying and managing large datasets residing in distributed storage. Hive is a powerful tool for ETL, It is, however, relatively slow compared with traditional […]

Read More

Let’s Understand Data Lake, Data Warehouse and Database

“Data lakes, data warehouses, and databases “–All these are some terminologies used in Data Management. But what exactly their meaning is and are the same or differ from each other, let’s try to explore in this article.  We will start with the definitions, then will discuss key differences. A database is generic data storage and […]

Read More

How to Build Big Data Analytics Infrastructure

ref https://www.datasciencecentral.com/profiles/blogs/big-data-analytics-infrastructure   Big data can bring huge benefits to businesses of all sizes. However, as with any business project, proper preparation and planning is essential, especially when it comes to infrastructure. Until recently it was hard for companies to get into big data without making heavy infrastructure investments (expensive data warehouses, software, analytics staff, etc.). But […]

Read More

Difference between AI, Machine Learning ,Deep Learning And Statistics

Artificial Intelligence, Machine learning, and Deep learning are frequently used terms in today’s scenario. Some also relate Statistics with AI, Machine learning and Deep learning.everyone wants to know how they are differ and related, here I am trying to answer this query, Before going to start with Statistics I am explaining about Artificial Intelligence, Machine […]

Read More

Big data , hadoop and spark

Big data is an important technology as it allows analyzing large amounts of information – both structured and unstructured – quickly. Map Reduce is a paradigm that allows computation to be done in distributed and parallel manner. It is cheaper to run 1000 machine with 1 GB RAM than buying a single 1000 GB machine. […]

Read More

What is Apache Spark?

Apache Spark is an emerging platform that has more flexibility than MapReduce but more structure than a basic message passing interface. It relies on the concept of distributed data structures (what it calls RDDs) and operators. Because Spark is a lower level thing that sits on top of a message passing interface, it has higher level […]

Read More

Hadoop – An Open Source FrameWork

Hadoop is a MapReduce framework that enables processing large datasets in parallel, on clusters of commodity hardware. This is cheaper, as it’s a open source solution that can run on commodity hardware while handling petabytes of data. It’s faster on massive data volumes as data processing is done in parallel. A complete Hadoop MapReduce based solution […]

Read More

Introduction To MongoDB

MongoDB is an open source, cross-platform, and the most popular NoSQL database program. Database,collections and documents are terminology in mongodb. Each database has collections which in turn has documents. The data stored is in the form of JSON style documents (rows). It is useful in building scalable websites with millions of users. Mongodb – Relational […]

Read More

What is MySQL and NoSQL

MySQL – A relational database MySQL is relational database written in C and C++. It is easy to use and open source.Along with PHP, it can be used to create dynamic server-side applications. It supports many operating systems and many programming languages like C, C++, Java, PHP etc. Features of MySQL Easy to use with […]

Read More