At the point when a gigantic record is put into HDFS, the Hadoop structure parts that document into pieces (Block measure 128 MB as a matter of course). These squares are then duplicated into hubs over the group. Each piece is duplicated as well (default is 3 for every square) with the goal that disappointment of hub or defilement of information square won’t bring about information misfortune.
The MapReduce program that uses the information is likewise conveyed over the hubs, ideally on similar hubs where information piece dwells, to exploit information territory. That way information is prepared in parallel as various squares of the record is handled on various hubs at the same time by the MapReduce code.
Yarn is in charge of assigning assets to the different running applications.