Career Self-Assessment Test-1



Your personality is a major factor to consider when deciding which careers, you might enjoy most. This quick assessment can help you understand how the tasks and work environments of different careers are associated with personality types and which careers may fit you the best.

STEP ONE: Take the Assessment
In each section below, tick the items you think you would enjoy the most. Tick as many as apply.
“R” Section
Repair a car……………………………………….
Build things with wood……………………………………….
Work outdoors……………………………………….
Study electronics……………………………………….
Arrest lawbreakers……………………………………….
Plant a garden……………………………………….
Work with animals……………………………………….
Operate power tools……………………………………….
Drive a truck……………………………………….
“I” Section
Study causes of diseases……………………………………….
Work on a science project……………………………………….
Study human anatomy……………………………………….
Work in a science lab……………………………………….
Research solutions to environmental problems……………………………………….
Collect minerals and rocks……………………………………….
Study the solar system……………………………………….
Do math problems……………………………………….
Study plants and animals……………………………………….
“A” Section
Sing before the public……………………………………….
Design clothing……………………………………….
Decorate a home or office……………………………………….
Act in or direct a play……………………………………….
Write a poem, story or play……………………………………….
Design a poster……………………………………….
Create a sculpture……………………………………….
Arrange flowers……………………………………….
Make videos……………………………………….
“S” Section
Work with children……………………………………….
Care for a sick person……………………………………….
Help people who are upset……………………………………….
Interview clients……………………………………….
Help a person with disabilities……………………………………….
Work as a volunteer……………………………………….
Study psychology……………………………………….
Make people laugh……………………………………….
Teach teens or adults……………………………………….
E” Section
Start my own business……………………………………….
Make a speech……………………………………….
Supervise the work of others……………………………………….
Start a club……………………………………….
Save money……………………………………….
Sell things……………………………………….
Lead a meeting……………………………………….
Take charge of a project……………………………………….
Work in a political campaign……………………………………….
“C” Section
Keep detailed reports……………………………………….
Operate business machines……………………………………….
Organize a work area……………………………………….
Take telephone messages……………………………………….
Attend to details……………………………………….
Balance a budget……………………………………….
Use a computer……………………………………….
Proofread a document……………………………………….
Create a filing system……………………………………….
STEP TWO: SCORE YOUR TOTALS
Add up how many times you Ticked each section above and fill in the totals for each in the space provided below.
R...........
I...........
A...........
S...........
E...........
C...........                                                                                                                                                                                             
Each letter represents a career interest category. Choose the letters in which you scored the highest and review the descriptions below to discover possible careers you may want to consider.

STEP THREE: Understand Your Letters-Career Interests and Personality Types
What are your two or three highest scoring interests? Place a mark by your top interest  areas.

Realistic: The “Do-ers.” People who enjoy practical, hands-on problems and solutions. May have athletic or mechanical ability. Prefer to work with objects, machines, tools, plants, and/or animals. May prefer to work outdoors. They like to accomplish tasks. They are dependable, punctual, detailed, hard-working, and reliable individuals. Possible careers include mechanic, chef, engineer, police officer, athlete, pilot, soldier, and firefighter.
Investigative: The “Thinkers.” People who enjoy work activities that have to do with ideas and thinking more than with physical activity. They like to observe, learn, investigate, analyze, evaluate, problem-solve. They are scientific and lab-oriented, and are fascinated by how things work. They tend to have logical and mathematical abilities. They are complex, curious, research-oriented, cool, calm and collected individuals. Possible careers include architect, computer scientist, psychologist, doctor,    and pharmacist.
Artistic: The “Creators.” People who have artistic, innovative, intuition ability and like to work in unstructured situations using imagination and creativity. They like self-expression in their work. Possible careers include musician, artist, interior  designer, graphic designer, actor, writer, and lawyer.
Social: The “Helpers.” People who like to work with others by informing, helping, training, teaching, developing, or curing them. Often are skilled with words. They enjoy helping others and have a lot of empathy for the feelings of others. Possible careers include social worker, counselor, occupational therapist, teacher, nurse, librarian, and dental hygienist.
Enterprising: The “Persuaders.” People who enjoy work activities that have to do with starting up and carrying out projects, especially business ventures. They like influencing, persuading, and leading people and making decisions. They may be easily bored and grow restless with routine. They prefer to work in their own unique style and like to take risks. Possible careers include business owner, lawyer, school administrator, sales person, real estate agent, judge, and public relations specialist.
Conventional: The “Organizers.” People who like to work with data, have clerical and/or numerical ability, and who enjoy work activities that follow set procedures and routines. Conventional types are people who are good at coordinating people, places, or events. Possible careers include accountant, secretary, bank teller, dental assistant, and math teacher.

Apptitude test-2

1) Circle : Circumference : : Square : ?
A.    Volume
B.    Area
C.    Diagonal
D.    Perimeter

2) Receptionist : Office : Hostess : ?
A.    Aircraft
B.    Crew
C.    Hospital
D.    Airport

3) Antonym of accord
A.    Act
B.    Dissent
C.    Policy
D.    Concord

4) Antonym of elevation
A.    Depreciation
B.    Depreciation
C.    Deflation
D.    Depression
           
5) Synonym of Mayhem
A.    Defeat
B.    Excitement
C.    Havoc
D.    Jubilation

6) Synonym of Revoke
A.    Repudiate
B.    Repeal
C.    Annul
D.    Force

7) grass: soil : : seaweed : ___________
A.    water
B.    river
C.    salty
D.    fish

8) Crumb : Bread ::
A.    Ounce : Unit
B.    Splinter : Wood
C.    Water : Bucket
D.    twine : rope

9) A told B that C is his father's nephew. D is A's cousin but not the brother of C. What relationship is there between D and C ?
A.    Father
B.    Sisters
C.    Aunt
D.    Mother

10) In a family, there are six members A, B, C, D, E and F. A and B are a married couple, A being the male member. D is the only son of C, who is the brother of A. E is the sister of D. B is the daughter-in-law of F, whose huasband has died. How is E related to C ?
A.    Nephew
B.    Daughter
C.    Sister
D.    Son-in-Law

11) If cushion is called pillow, pillow is called mat, mat is called bedsheet and bedsheet is called cover, which will be spread on the floor ?
A.    Cover
B.    Bedsheet
C.    Mat
D.    Pillow

12) If clock is called television, television is called radio, radio is called oven, oven is called grinder and grinder is called iron, in what will a lady bake ?
A.    Radio
B.    Oven
C.    Grinder
D.    Iron

In the following passage there are blanks, each of which has been numbered. These numbers are printed below the passage and against each, five words are suggested, one of which fits the blank appropriately. Find out the appropriate word in each case.

One day an expert in time management was …(13)… to a group of business management students and to drive home a point he used an …(14)… they will never forget. As he stood in front of a group of brilliant students he said, "Okay it’s …(15)… for a quiz." He then pulled out a one gallon jar and set it on the table in front of him.

13)
A.    expressing
B.    discussing
C.    speaking
D.    conveying
E.    addressing

14)
A.    illustration
B.    emblem
C.    expression
D.    impression
E.    imagination

15)
A.    scheduled
B.    time
C.    opportunity
D.    usual
E.    ready

16) 6, 11, 21, 36, 56, ?
A.    51
B.    71
C.    81
D.    41

17) 13, 35, 57, 79, 911, ?
A.    1145
B.    1113
C.    1117
D.    1110

18) If a person walks at 14 km/hr instead of 10 km/hr, he would have walked 20 km more. The actual distance travelled by him is
A.    50 km
B.    56 km
C.    70 km
D.    80 km

19) The ratio between the speeds of two trains is 7 : 8. If the second train runs 400 kms in 4 hours, then the speed of the first train is:
A.    70 km/hr
B.    75 km/hr
C.    84 km/hr
D.    87.5 km/hr

20) A square garden has fourteen posts along each side at equal interval. Find how many posts are there in all four sides.
A.    56
B.    44
C.    52
D.    60

21) Find the ratio of purchase price and sell price if there is loss of 12 1/2 %.
A.    7 : 8
B.    8 : 7
C.    2 : 25
D.    25 : 2

22) The sum of the present age of the father and his daughter is 42 years. 7 years later, the father will be 3 times old than the daughter. The present age of the father is
A.    32
B.    28
C.    35
D.    33

23) A train traveling at 48 kmph completely crosses another train having half its length and traveling in opposite direction at 42 kmph, in 12 seconds. It also passes a railway platform in 45 seconds. The length of the platform is
A.    550m
B.    600m
C.    450m
D.    400m

24) A train overtakes two persons who are walking in the same direction in which the train is going, at the rate of 2 kmph and 4 kmph and passes them completely in 9 and 10 seconds respectively. The length of the train is
A.    60 m
B.    65 m
C.    50 m
D.    55 m

25) A train 270 m long is moving at a speed of 24 kmph. It will cross a man coming from the opposite direction at a speed of 3 kmph, in
A.    30 s
B.    32 s
C.    34 s
D.    36 s

26) 12 men can complete a work in 18 days. Six days after they started working, 4 more men joined them. How many days will all of them together complete the remaining work?
A.    10 days
B.    8 days
C.    11 days
D.    9 days

27) A can do a certain work in 25 days which B alone can do in 20 days. A stared the work and was joined by B after 10 days. The work was completed in
A.    18 2/3 days
B.    16 2/3 days
C.    14 1/3 days
D.    15 1/3 days

28) A can do 1/3 of the work in 5 days and B can do 2/5 of the work in 10 days. In how many days both A and B toether can do the work?
A.    9 3/8 days
B.    8 2/3 days
C.    8 3/5 days
D.    9 3/5 days

29) T, R, P, N, L, ?, ?
A.    J, G
B.    J, H
C.    K, H
D.    K, I

30) BD, GI, LN, QS, ?
A.    TV
B.    UW
C.    WX
D.    VX

Aptitude Test-1

Time: 20 minutes

1) AZ, GT, MN, ?, YB
A.    KF
B.    RX
C.    SH
D.    TS

2) Find the odd one out
A.    door: bang
B.    piano: play
C.    drum: beat
D.    rain: platter

3) If wall is called window, window is called door, door is called floor, floor is callled roof and roof is callled ventilator, what will a person stand on ?
A.    Window
B.    Wall
C.    Floor
D.    Roof

4) There are six persons A, B, C, D , E and F. C is the sister of F. B is the brother of E's husband. D is the father of A and grandfather of F. There are two fathers, three brothers and a mother in the group. Which of the following is a group of brothers ?
A.    ABD
B.    ABF
C.    BFC
D.    BDF

5) A, E, I, O, ?
A.    T
B.    P
C.    G
D.    U

6) 4, 6, 12, 14, 28, 30, ?
A.    32
B.    64
C.    62
D.    60

7) 11, 13, 17, 19, 23, 29, 31, 37, 41, ?
A.    43
B.    47
C.    51
D.    53

8) A man complete a journey in 10 hours. He travels first half of the journey at the rate of 21 km/hr and second half at the rate of 24 km/hr. Find the total journey in km.
A.    220 km
B.    224 km
C.    230 km
D.    234 km

9) A train 360 m long is running at a speed of 45 km/hr. In what time will it pass a bridge 140 m long?
A.    40 s
B.    45 s
C.    50 s
D.    55 s

10) A, B and C are employed to do a piece of work for Rs.529. A and B together are supposed to do 19/23 of the work and B and C together 8/23 of the work. What amount should A be paid?
A.    Rs.355
B.    Rs.345
C.    Rs.375
D.    Rs.335

11) A takes 3 days and B 2 days. Both worked together and finished the work, and got Rs.150. What is the share of A ?
A.    Rs.90
B.    Rs.60
C.    Rs.80
D.    Rs.50

12) It takes one minute to fill 3/7 th of a vessel. What is the time taken in minutes to fill the whole of the vessel?
A.    4/3
B.    3/4
C.    7/3
D.    3/2

13) A coin is placed on a plain paper. How many coins of the same size can be placed around it so that each of the coins touches its adjacent ones ?
A.    4
B.    5
C.    6
D.    7

14) If the Republic Day of India in 1980 falls on Saturday, X was born on March 3, 1980 and Y is older to X by four days, then Y’s birthday fell on
A.    Thursday
B.    Friday
C.    Wednesday
D.    None of these

15) ba_ba_bac_acb_cbac
A.    aacb
B.    bbca
C.    ccba
D.    cbac

16) bca_b_aabc _a_caa
A.    acab
B.    bcbb
C.    cbab
D.    ccab

17) COMPETITION : CONTESTANT ::
A.    trial : witness
B.    journey : traveler
C.    royalty : monarch
D.    election : candidate

18) INVENTORY : GOODS ::
A.    agenda : meeting
B.    snapshot : image
C.    ballot : voters
D.    roll : members

19) Find the odd one out
A.    December
B.    February
C.    March
D.    July

20) Find the odd one out
A.    Arc
B.    Diagonal
C.    Radius
D.    Diameter

21) If LBAEHC is the code for BLEACH, then which of the following is coded NBOLZKMH?
A.    BNLOKZHM
B.    MANKYJLG
C.    LOBNHMKZ
D.    OBNKZLHM

22) If ELCSUM is the code for MUSCLE, which word has the code LATIPAC?
A.    CAPRICE
B.    CONFESS
C.    CONDUCE
D.    CAPITAL

23) truthfulness: court : : cleanliness : _________
A.    virtue
B.    bath
C.    restaurant
D.    pig

24) lion: animal : : flower : ___________
A.    plant
B.    roots
C.    grass
D.    rose

Directions (1-5):In each of the following questions , a part of sentence is given in bold. Below each sentence, four choices numbered (A), (B), (C) and (D) are given which can substitute the part of the sentence in bold. Find out the choice which can correctly substitute that part of the sentence. The number of that choice is the answer. If 'No correction needed' is your answer, the choice is (E).

25) We must take it granted that he will not come for today's function.
A.    have it granted
B.    took it as granted
C.    taking it granted
D.    take it for granted
E.    No correction required

26) She unnecessarily picked up a quarrel with him and left the party.
A.    picking up
B.    picked
C.    picked on
D.    has picked up
E.    No correction required

27) He has the guts to rise from the occasion and come out successfully.
A.    in rising from
B.    to raise with
C.    to rise against
D.    to rise to
E.    No correction required

28) Antonym of "incidental"
A.    permissible
B.    usual
C.    conventional
D.    intentional

29) Synonym of "LAMENT"
A.    Console
B.    Condone
C.    Comment
D.    Complain

30) Synonym of "CANNY"
A.    Stout
B.    Clever
C.    Handsome
D.    Obstinate

Certifications for mobile app developers in India

Android Certified Application Developer

Advanced Training Consultants (ATC) for Android provides the training and certification for Android developers. The Android Certified Application Developer is the beginning-level certification for learning to design, build and maintain Android applications, and arguably the best certification you can have in your mobile app developer arsenal.
To earn the certification, candidates must pass the AND 401: Android Application Development Exam. The exam consists of 45 multiple choice questions (MCQ) to be completed within 90 minutes, with a clearing score of 70%. The exam content is based on the Android Application development course. Taking the course is not a prerequisite to sitting for the exam, and self-study guides are available to help prepare.
Upon successful completion of the exam candidates will receive their certificate and ID card online through e-mail within four weeks.

Oracle Java ME Mobile Application Developer

The prerequisites to appear for this exam can be rather expensive as one must first achieve certification as an Oracle Certified Professional (OCP), Java Programmer (SE5 or SE6) or a Sun certified programmer (SCJP), any edition.
Candidates are required to clear the Java (ME) Mobile Edition 1 Mobile Application Developer Certified Professional 1Z0-869 exam. In order to prepare for the exam you can take Java ME: Develop Applications for Mobile Phones training. In addition to this, you also need solid hands-on practice or on-the-job experience performing the tasks described in the exam topics.
The Oracle Java ME Mobile Application Developer exam consists of 68 MCQs to be completed within 158 minutes. Clearing the exam will require a minimum score of 58%.

MCSD Windows Store Apps

Microsoft Certified Solutions Developer (MCSD): Windows Store Apps certification confirms that you have the ability to develop fast and fluid Windows 8 apps. There are two paths to get this certification either using HTML5 or using C#. The prerequisite for this exam is that you should have one to two years of HTML5 or C# development experience as well as experience in Windows application development.
Microsoft recommends the HTML5 path for candidates with solid experience using JavaScript or web apps. .Net developers are encouraged to go the C# route. Regardless of the path, you will have to clear three exams. Note that Microsoft does not “identify exam formats or questions types” prior to the exam. There are however numerous, practice tests and study materials as well as videos of the sorts of exam questions offered. Check out Microsoft’s Virtual Academy for a more complete discussion on the exam, preparation and sitting for the exam.

Using HTML5
Exam 70-480: Programming in HTML5 with JavaScript and CSS3
Exam 70-481: Essentials of Developing Windows Store Apps Using HTML5 and JavaScript
Exam 70-482: Advanced Windows Store App Development Using HTML5 and JavaScript

Using C#
Exam 70-483: Programming in C#
Exam 70-484: Essentials of Developing Windows Store Apps Using C#
Exam 70-485: Advanced Windows Store App Development Using C#
This certification can be a bit pricey ($150 USD) per exam, but well worth your time and effort. If you are planning a career in mobile app development, this certification will give you a strong edge.

SAP Certified Development Associate – SMP Hybrid and Native Mobile Application Developer

For developers who support SAP/Sybase products, this is a solid intermediate-level certification. Credentialed individuals possess a foundational knowledge of Hybrid and Native mobile application development on the SAP Mobile Platform 2.3. This certification is offered and maintained by Enterprise Mobility, Germany.
Earning this certification will require that you have “several years of practical on-the-job experience.” The exam is made up of 80 MCQs, spread over nine topic areas, to be completed in three hours. A cut score of 65 percent is required.
Enterprise Mobility offers great support for candidates. If you subscribe to the SAB Learning Hub you will also receive access to relevant training materials and “Learning Rooms,” virtual learning spaces where SAP instructors are available to guide you through the required training.

IBM Certified Mobile Application Developer – Worklight V6.0

This is an intermediate level certification for developers with significant hands-on experience using Worklight V6.0 to develop mobile hybrid and native applications. Achieving this certification will enable you to develop client-side apps, server-side integration and security components as well as test, deploy and manage Worklight V6.0 apps.
To complete this certification candidates must pass Test C2180-278. The test consists of 54 questions with single or multiple answers. You will have 90 minutes to complete the exam, and a score of 66 percent to clear.
To prepare for this certification it is recommended that you be familiar with the job role description and the parameters this certification is based on. You should have knowledge of the topics outlined in the test objectives/skills measured on the test. To measure the current level of skills you can take a sample test as well.

Hadoop & MapReduce Interview Questions and Answers-1

1) What is Hadoop Map Reduce ?

For processing large data sets in parallel across a hadoop cluster, Hadoop MapReduce framework is used.  Data analysis uses a two-step map and reduce process.

2) How Hadoop MapReduce works?

In MapReduce, during the map phase it counts the words in each document, while in the reduce phase it aggregates the data as per the document spanning the entire collection. During the map phase the input data is divided into splits for analysis by map tasks running in parallel across Hadoop framework.

3) Explain what is shuffling in MapReduce ?

The process by which the system performs the sort and transfers the map outputs to the reducer as inputs is known as the shuffle

4) Explain what is distributed Cache in MapReduce Framework ?

Distributed Cache is an important feature provided by map reduce framework. When you want to share some files across all nodes in Hadoop Cluster, DistributedCache  is used.  The files could be an executable jar files or simple properties file.

5) Explain what is NameNode in Hadoop?

NameNode in Hadoop is the node, where Hadoop stores all the file location information in HDFS (Hadoop Distributed File System).  In other words, NameNode is the centrepiece of an HDFS file system.  It keeps the record of all the files in the file system, and tracks the file data across the cluster or multiple machines

7

6) Explain what is JobTracker in Hadoop? What are the actions followed by Hadoop?

In Hadoop for submitting and tracking MapReduce jobs,  JobTracker is used. Job tracker run on its own JVM process

Hadoop performs following actions in Hadoop

    Client application submit jobs to the job tracker
    JobTracker communicates to the Namemode to determine data location
    Near the data or with available slots JobTracker locates TaskTracker nodes
    On chosen TaskTracker Nodes, it submits the work
    When a task fails, Job tracker notify and decides what to do then.
    The TaskTracker nodes are monitored by JobTracker

7) Explain what is heartbeat in HDFS?

Heartbeat is referred to a signal used between a data node and Name node, and between task tracker and job tracker, if the Name node or job tracker does not respond to the signal, then it is considered there is some issues with data node or task tracker

8) Explain what combiners is and when you should use a combiner in a MapReduce Job?

To increase the efficiency of MapReduce Program, Combiners are used.  The amount of data can be reduced with the help of combiner’s that need to be transferred across to the reducers. If the operation performed is commutative and associative you can use your reducer code as a combiner.  The execution of combiner is not guaranteed in Hadoop

9) What happens when a datanode fails ?

When a datanode fails

    Jobtracker and namenode detect the failure
    On the failed node all tasks are re-scheduled
    Namenode replicates the users data to another node

10) Explain what is Speculative Execution?

In Hadoop during Speculative Execution a certain number of duplicate tasks are launched.  On different slave node, multiple copies of same map or reduce task can be executed using Speculative Execution. In simple words, if a particular drive is taking long time to complete a task, Hadoop will create a duplicate task on another disk.  Disk that finish the task first are retained and disks that do not finish first are killed.

11) Explain what are the basic parameters of a Mapper?

The basic parameters of a Mapper are

    LongWritable and Text
    Text and IntWritable

12) Explain what is the function of MapReducer partitioner?

The function of MapReducer partitioner is to make sure that all the value of a single key goes to the same reducer, eventually which helps evenly distribution of the map output over the reducers

13) Explain what is difference between an Input Split and HDFS Block?

Logical division of data is known as Split while physical division of data is known as HDFS Block

14) Explain what happens in textinformat ?

In textinputformat, each line in the text file is a record.  Value is the content of the line while Key is the byte offset of the line. For instance, Key: longWritable, Value: text

15) Mention what are the main configuration parameters that user need to specify to run Mapreduce Job ?

The user of Mapreduce framework needs to specify

    Job’s input locations in the distributed file system
    Job’s output location in the distributed file system
    Input format
    Output format
    Class containing the map function
    Class containing the reduce function
    JAR file containing the mapper, reducer and driver classes

16) Explain what is WebDAV in Hadoop?

To support editing and updating files WebDAV is a set of extensions to HTTP.  On most operating system WebDAV shares can be mounted as filesystems , so it is possible to access HDFS as a standard filesystem by exposing HDFS over WebDAV.

17)  Explain what is sqoop in Hadoop ?

To transfer the data between Relational database management (RDBMS) and Hadoop HDFS a tool is used known as Sqoop. Using Sqoop data can be transferred from RDMS like MySQL or Oracle into HDFS as well as exporting data from HDFS file to RDBMS

18) Explain how JobTracker schedules a task ?

The task tracker send out heartbeat messages to Jobtracker usually every few minutes to make sure that JobTracker is active and functioning.  The message also informs JobTracker about the number of available slots, so the JobTracker can stay upto date with where in the cluster work can be delegated

19) Explain what is Sequencefileinputformat?

Sequencefileinputformat is used for reading files in sequence. It is a specific compressed binary file format which is optimized for passing data between the output of one MapReduce job to the input of some other MapReduce job.

20) Explain what does the conf.setMapper Class do ?

Conf.setMapperclass  sets the mapper class and all the stuff related to map job such as reading data and generating a key-value pair out of the mapper

21) Explain what is Hadoop?

It is an open-source software framework for storing data and running applications on clusters of commodity hardware.  It provides enormous processing power and massive storage for any type of data.

22) Mention what is the difference between an RDBMS and Hadoop?
RDBMS    Hadoop
RDBMS is relational database management system    Hadoop is node based flat structure
It used for OLTP processing whereas Hadoop    It is currently used for analytical and for BIG DATA processing
In RDBMS, the database cluster uses the same data files stored in shared storage    In Hadoop, the storage data can be stored independently in each processing node.
You need to preprocess data before storing it    you don’t need to preprocess data before storing it

23) Mention Hadoop core components?

Hadoop core components include,

    HDFS
    MapReduce

24) What is NameNode in Hadoop?

NameNode in Hadoop is where Hadoop stores all the file location information in HDFS. It is the master node on which job tracker runs and consists of metadata.

25) Mention what are the data components used by Hadoop?

Data components used by Hadoop are

    Pig
    Hive

26) Mention what is the data storage component used by Hadoop?

The data storage component used by Hadoop is HBase.

27) Mention what are the most common input formats defined in Hadoop?

The most common input formats defined in Hadoop are;

    TextInputFormat
    KeyValueInputFormat
    SequenceFileInputFormat

28) In Hadoop what is InputSplit?

It splits input files into chunks and assign each split to a mapper for processing.

29) For a Hadoop job, how will you write a custom partitioner?

You write a custom partitioner for a Hadoop job, you follow the following path

    Create a new class that extends Partitioner Class
    Override method getPartition
    In the wrapper that runs the MapReduce
    Add the custom partitioner to the job by using method set Partitioner Class or – add the custom partitioner to the job as a config file

30) For a job in Hadoop, is it possible to change the number of mappers to be created?

No, it is not possible to change the number of mappers to be created. The number of mappers is determined by the number of input splits.

31) Explain what is a sequence file in Hadoop?

To store binary key/value pairs, sequence file is used. Unlike regular compressed file, sequence file support splitting even when the data inside the file is compressed.

32) When Namenode is down what happens to job tracker?

Namenode is the single point of failure in HDFS so when Namenode is down your cluster will set off.

33) Explain how indexing in HDFS is done?

Hadoop has a unique way of indexing. Once the data is stored as per the block size, the HDFS will keep on storing the last part of the data which say where the next part of the data will be.

34) Explain is it possible to search for files using wildcards?

Yes, it is possible to search for files using wildcards.

35) List out Hadoop’s three configuration files?

The three configuration files are

    core-site.xml
    mapred-site.xml
    hdfs-site.xml

36) Explain how can you check whether Namenode is working beside using the jps command?

Beside using the jps command, to check whether Namenode are working you can also use

/etc/init.d/hadoop-0.20-namenode status.

37) Explain what is “map” and what is “reducer” in Hadoop?

In Hadoop, a map is a phase in HDFS query solving.  A map reads data from an input location, and outputs a key value pair according to the input type.

In Hadoop, a reducer collects the output generated by the mapper, processes it, and creates a final output of its own.

38) In Hadoop, which file controls reporting in Hadoop?

In Hadoop, the hadoop-metrics.properties file controls reporting.

39) For using Hadoop list the network requirements?

For using Hadoop the list of network requirements are:

    Password-less SSH connection
    Secure Shell (SSH) for launching server processes

40) Mention what is rack awareness?

Rack awareness is the way in which the namenode determines on how to place blocks based on the rack definitions.

41) Explain what is a Task Tracker in Hadoop?

A Task Tracker in Hadoop is a slave node daemon in the cluster that accepts tasks from a JobTracker. It also sends out the heartbeat messages to the JobTracker, every few minutes, to confirm that the JobTracker is still alive.

42) Mention what daemons run on a master node and slave nodes?

    Daemons run on Master node is “NameNode”
    Daemons run on each Slave nodes are “Task Tracker” and “Data”

43) Explain how can you debug Hadoop code?

The popular methods for debugging Hadoop code are:

    By using web interface provided by Hadoop framework
    By using Counters

44) Explain what is storage and compute nodes?

    The storage node is the machine or computer where your file system resides to store the processing data
    The compute node is the computer or machine where your actual business logic will be executed.

45) Mention what is the use of Context Object?

The Context Object enables the mapper to interact with the rest of the Hadoop

system. It includes configuration data for the job, as well as interfaces which allow it to emit output.

46) Mention what is the next step after Mapper or MapTask?

The next step after Mapper or MapTask is that the output of the Mapper are sorted, and partitions will be created for the output.

47) Mention what is the number of default partitioner in Hadoop?

In Hadoop, the default partitioner is a “Hash” Partitioner.

48) Explain what is the purpose of RecordReader in Hadoop?

In Hadoop, the RecordReader loads the data from its source and converts it into (key, value) pairs suitable for reading by the Mapper.

49) Explain how is data partitioned before it is sent to the reducer if no custom partitioner is defined in Hadoop?

If no custom partitioner is defined in Hadoop, then a default partitioner computes a hash value for the key and assigns the partition based on the result.

50) Explain what happens when Hadoop spawned 50 tasks for a job and one of the task failed?

It will restart the task again on some other TaskTracker if the task fails more than the defined limit.

51) Mention what is the best way to copy files between HDFS clusters?

The best way to copy files between HDFS clusters is by using multiple nodes and the distcp command, so the workload is shared.

52) Mention what is the difference between HDFS and NAS?

HDFS data blocks are distributed across local drives of all machines in a cluster while NAS data is stored on dedicated hardware.

53) Mention how Hadoop is different from other data processing tools?

In Hadoop, you can increase or decrease the number of mappers without worrying about the volume of data to be processed.

54) Mention what job does the conf class do?

Job conf class separate different jobs running on the same cluster.  It does the job level settings such as declaring a job in a real environment.

55) Mention what is the Hadoop MapReduce APIs contract for a key and value class?

For a key and value class, there are two Hadoop MapReduce APIs contract

    The value must be defining the org.apache.hadoop.io.Writable interface
    The key must be defining the org.apache.hadoop.io.WritableComparable interface

56) Mention what are the three modes in which Hadoop can be run?

The three modes in which Hadoop can be run are

    Pseudo distributed mode
    Standalone (local) mode
    Fully distributed mode

57) Mention what does the text input format do?

The text input format will create a line object that is an hexadecimal number.  The value is considered as a whole line text while the key is considered as a line object. The mapper will receive the value as ‘text’ parameter while key as ‘longwriteable’ parameter.

58) Mention how many InputSplits is made by a Hadoop Framework?

Hadoop will make 5 splits

    1 split for 64K files
    2 split for 65mb files
    2 splits for 127mb files

59) Mention what is distributed cache in Hadoop?

Distributed cache in Hadoop is a facility provided by MapReduce framework.  At the time of execution of the job, it is used to cache file.  The Framework copies the necessary files to the slave node before the execution of any task at that node.

60) Explain how does Hadoop Classpath plays a vital role in stopping or starting in Hadoop daemons?

Classpath will consist of a list of directories containing jar files to stop or start daemons.

Big Data

Big data is defined as the voluminous amount of structured, unstructured or semi-structured data that has huge potential for mining but is so large that it cannot be processed using traditional database systems. Big data is characterized by its high velocity, volume and variety that requires cost effective and innovative methods for information processing to draw meaningful business insights. More than the volume of the data – it is the nature of the data that defines whether it is considered as Big Data or not.

What do the four V’s of Big Data denote?
IBM has a nice, simple explanation for the four critical features of big data:
a) Volume –Scale of data
b) Velocity –Analysis of streaming data
c) Variety – Different forms of data
d) Veracity –Uncertainty of data

How big data analysis helps businesses increase their revenue? Give example.
Big data analysis is helping businesses differentiate themselves – for example Walmart the world’s largest retailer in 2014 in terms of revenue - is using big data analytics to increase its sales through better predictive analytics, providing customized recommendations and launching new products based on customer preferences and needs. Walmart observed a significant 10% to 15% increase in online sales for $1 billion in incremental revenue. There are many more companies like Facebook, Twitter, LinkedIn, Pandora, JPMorgan Chase, Bank of America, etc. using big data analytics to boost their revenue.

Differentiate between Structured and Unstructured data.
Data which can be stored in traditional database systems in the form of rows and columns, for example the online purchase transactions can be referred to as Structured Data. Data which can be stored only partially in traditional database systems, for example, data in XML records can be referred to as semi structured data. Unorganized and raw data that cannot be categorized as semi structured or structured data is referred to as unstructured data. Facebook updates, Tweets on Twitter, Reviews, web logs, etc. are all examples of unstructured data.

Hadoop Interview Questions and Answers-1

1.What are real-time industry applications of Hadoop?
    Hadoop, well known as Apache Hadoop, is an open-source software platform for scalable and distributed computing of large volumes of data. It provides rapid, high performance and cost-effective analysis of structured and unstructured data generated on digital platforms and within the enterprise. It is used in almost all departments and sectors today.
    Some of the instances where Hadoop is used:
    • Managing traffic on streets
    • Streaming processing
    • Content Management and Archiving Emails
    • Processing Rat Brain Neuronal Signals using a Hadoop Computing Cluster
    • Fraud detection and Prevention
    • Advertisements Targeting Platforms are using Hadoop to capture and analyze click stream, transaction, video and social media data
    • Managing content, posts, images and videos on social media platforms
    • Analyzing customer data in real-time for improving business performance
    • Public sector fields such as intelligence, defense, cyber security and scientific research
    • Financial agencies are using Big Data Hadoop to reduce risk, analyze fraud patterns, identify rogue traders, more precisely target their marketing campaigns based on customer segmentation, and improve customer satisfaction
    • Getting access to unstructured data like output from medical devices, doctor’s notes, lab results, imaging reports, medical correspondence, clinical data, and financial data.

2.How is Hadoop different from other parallel computing systems?
Hadoop is a distributed file system, which lets you store and handle massive amount of data on a cloud of machines, handling data redundancy. Go through this HDFS content to know how the distributed file system works. The primary benefit is that since data is stored in several nodes, it is better to process it in distributed manner. Each node can process the data stored on it instead of spending time in moving it over the network.
On the contrary, in Relational database computing system, you can query data in real-time, but it is not efficient to store data in tables, records and columns when the data is huge.
Hadoop also provides a scheme to build a Column Database with Hadoop HBase, for runtime queries on rows.

3.What all modes Hadoop can be run in?
    Hadoop can run in three modes:
    1. Standalone Mode: Default mode of Hadoop, it uses local file stystem for input and output operations. This mode is mainly used for debugging purpose, and it does not support the use of HDFS. Further, in this mode, there is no custom configuration required for mapred-site.xml, core-site.xml, hdfs-site.xml files. Much faster when compared to other modes.
    2. Pseudo-Distributed Mode (Single Node Cluster): In this case, you need configuration for all the three files mentioned above. In this case, all daemons are running on one node and thus, both Master and Slave node are the same.
    3. Fully Distributed Mode (Multiple Cluster Node): This is the production phase of Hadoop (what Hadoop is known for) where data is used and distributed across several nodes on a Hadoop cluster. Separate nodes are allotted as Master and Slave.

4.Explain the major difference between HDFS block and InputSplit.
    In simple terms, block is the physical representation of data while split is the logical representation of data present in the block. Split acts a s an intermediary between block and mapper. Suppose we have two blocks:
    Block 1: ii nntteell
    Block 2: Ii ppaatt
    Now, considering the map, it will read first block from ii till ll, but does not know how to process the second block at the same time. Here comes Split into play, which will form a logical group of Block1 and Block 2 as a single block. It then forms key-value pair using inputformat and records reader and sends map for further processing With inputsplit, if you have limited resources, you can increase the split size to limit the number of maps. For instance, if there are 10 blocks of 640MB (64MB each) and there are limited resources, you can assign ‘split size’ as 128MB. This will form a logical group of 128MB, with only 5 maps executing at a time.
    However, if the ‘split size’ property is set to false, whole file will form one inputsplit and is processed by single map, consuming more time when the file is bigger.

5.What is distributed cache and what are its benefits?
    Distributed Cache, in Hadoop, is a service by MapReduce framework to cache files when needed.Once a file is cached for a specific job, hadoop will make it available on each data node both in system and in memory, where map and reduce tasks are executing.Later, you can easily access and read the cache file and populate any collection (like array, hashmap) in your code.
Benefits of using distributed cache are:
• It distributes simple, read only text/data files and/or complex types like jars, archives and others. These archives are then un-archived at the slave node.
• Distributed cache tracks the modification timestamps of cache files, which notifies that the files should not be modified until a job is executing currently.

6.Explain the difference between NameNode, Checkpoint NameNode and BackupNode.
    NameNode is the core of HDFS that manages the metadata – the information of what file maps to what block locations and what blocks are stored on what datanode. In simple terms, it’s the data about the data being stored. NameNode supports a directory tree-like structure consisting of all the files present in HDFS on a Hadoop cluster. It uses following files for namespace:
    fsimage file- It keeps track of the latest checkpoint of the namespace.
    edits file-It is a log of changes that have been made to the namespace since checkpoint.
    Checkpoint NameNode has the same directory structure as NameNode, and creates checkpoints for namespace at regular intervals by downloading the fsimage and edits file and margining them within the local directory. The new image after merging is then uploaded to NameNode.
    There is a similar node like Checkpoint, commonly known as Secondary Node, but it does not support the ‘upload to NameNode’ functionality.
    Backup Node provides similar functionality as Checkpoint, enforcing synchronization with NameNode. It maintains an up-to-date in-memory copy of file system namespace and doesn’t require getting hold of changes after regular intervals. The backup node needs to save the current state in-memory to an image file to create a new checkpoint.

7.What are the most common Input Formats in Hadoop?
    There are three most common input formats in Hadoop:
    • Text Input Format: Default input format in Hadoop.
    • Key Value Input Format: used for plain text files where the files are broken into lines
    • Sequence File Input Format: used for reading files in sequence
8.Define DataNode and how does NameNode tackle DataNode failures?
    DataNode stores data in HDFS; it is a node where actual data resides in the file system. Each datanode sends a heartbeat message to notify that it is alive. If the namenode does noit receive a message from datanode for 10 minutes, it considers it to be dead or out of place, and starts replication of blocks that were hosted on that data node such that they are hosted on some other data node.A BlockReport contains list of all blocks on a DataNode. Now, the system starts to replicate what were stored in dead DataNode.
    The NameNode manages the replication of data blocksfrom one DataNode to other. In this process, the replication data transfers directly between DataNode such that the data never passes the NameNode.
9.What are the core methods of a Reducer?
    The three core methods of a Reducer are:
    1. setup(): this method is used for configuring various parameters like input data size, distributed cache.
    public void setup (context)
    2. reduce(): heart of the reducer always called once per key with the associated reduced task
    public void reduce(Key, Value, context)
    3. cleanup(): this method is called to clean temporary files, only once at the end of the task
    public void cleanup (context)
10.What is SequenceFile in Hadoop?
    Extensively used in MapReduce I/O formats, SequenceFile is a flat file containing binary key/value pairs. The map outputs are stored as SequenceFile internally. It provides Reader, Writer and Sorter classes. The three SequenceFile formats are:
    1. Uncompressed key/value records.
    2. Record compressed key/value records – only ‘values’ are compressed here.
    3. Block compressed key/value records – both keys and values are collected in ‘blocks’ separately and compressed. The size of the ‘block’ is configurable.
11.What is Job Tracker role in Hadoop?
    Job Tracker’s primary function is resource management (managing the task trackers), tracking resource availability and task life cycle management (tracking the taks progress and fault tolerance).
    • It is a process that runs on a separate node, not on a DataNode often
    • Job Tracker communicates with the NameNode to identify data location
    • Finds the best Task Tracker Nodes to execute tasks on given nodes
    • Monitors individual Task Trackers and submits the overall job back to the client.
    • It tracks the execution of MapReduce workloads local to the slave node.
12.What is the use of RecordReader in Hadoop?
    Since Hadoop splits data into various blocks, RecordReader is used to read the slit data into single record. For instance, if our input data is split like:
    Row1: Welcome to
    Row2: Intellipaat
    It will be read as “Welcome to Intellipaat” using RecordReader.
13.What is Speculative Execution in Hadoop?
    One limitation of Hadoop is that by distributing the tasks on several nodes, there are chances that few slow nodes limit the rest of the program. Tehre are various reasons for the tasks to be slow, which are sometimes not easy to detect. Instead of identifying and fixing the slow-running tasks, Hadoop tries to detect when the task runs slower than expected and then launches other equivalent task as backup. This backup mechanism in Hadoop is Speculative Execution.
    It creates a duplicate task on another disk. The same input can be processed multiple times in parallel. When most tasks in a job comes to completion, the speculative execution mechanism schedules duplicate copies of remaining tasks (which are slower) across the nodes that are free currently. When these tasks finish, it is intimated to the JobTracker. If other copies are executing speculatively, Hadoop notifies the TaskTrackers to quit those tasks and reject their output.
    Speculative execution is by default true in Hadoop. To disable, set mapred.map.tasks.speculative.execution and mapred.reduce.tasks.speculative.execution
    JobConf options to false.
14.What happens if you try to run a Hadoop job with an output directory that is already present?
    It will throw an exception saying that the output file directory already exists. To run the MapReduce job, you need to ensure that the output directory does not exist before in the HDFS.  To delete the directory before running the job, you can use shell:Hadoop fs –rmr /path/to/your/output/Or via the Java API: FileSystem.getlocal(conf).delete(outputDir, true);

15.How can you debug Hadoop code?
    First, check the list of MapReduce jobs currently running. Next, we need to see that there are no orphaned jobs running; if yes, you need to determine the location of RM logs.
    1. Run: “ps –ef | grep –I ResourceManager”
    and look for log directory in the displayed result. Find out the job-id from the displayed list and check if there is any error message associated with that job.
    2. On the basis of RM logs, identify the worker node that was involved in execution of the task.
    3. Now, login to that node and run – “ps –ef | grep –iNodeManager”
    4. Examine the Node Manager log. The majority of errors come from user level logs for each map-reduce job.
16.How to configure Replication Factor in HDFS?
    hdfs-site.xml is used to configure HDFS. Changing the dfs.replication property in hdfs-site.xml will change the default replication for all files placed in HDFS.
    You can also modify the replication factor on a per-file basis using the Hadoop FS Shell:[training@localhost ~]$ hadoopfs –setrep –w 3 /my/fileConversely, you can also change the replication factor of all the files under a directory.
    [training@localhost ~]$ hadoopfs –setrep –w 3 -R /my/dir

17.How to compress mapper output but not the reducer output?
    To achieve this compression, you should set:
    conf.set(“mapreduce.map.output.compress”, true)
    conf.set(“mapreduce.output.fileoutputformat.compress”, false)
18.What is the difference between Map Side join and Reduce Side Join?
    Map side Join at map side is performed data reaches the map. You need a strict structure for defining map side join. On the other hand, Reduce side Join (Repartitioned Join) is simpler than map side join since the input datasets need not be structured. However, it is less efficient as it will have to go through sort and shuffle phases, coming with network overheads.
19.How can you transfer data from Hive to HDFS?

By writing the query: hive> insert overwrite directory ‘/’ select * from emp;
You can write your query for the data you want to import from Hive to HDFS. The output you receive will be stored in part files in the specified HDFS path.

20.What companies use Hadoop, any idea?
Yahoo! (the biggest contributor to the creation of Hadoop) – Yahoo search engine uses Hadoop, Facebook – Developed Hive for analysis , Amazon, Netflix, Adobe, eBay, Spotify, Twitter, Adobe.