For a left outer join with output in tab-delimited format:
join -a 1 -1 2 -2 2 -t '[use ctrl-v tab at the command line to insert the tab]' 1.txt 2.txt
Join is a very powerful command, and much faster than SQL join, there is a great tutorial of different ways of using the command:
Examples of the unix join command Given the files:
::::::::::::::
1.txt
::::::::::::::
1 abc
2 lmn
3 pqr
::::::::::::::
2.txt
::::::::::::::
1 abc
3 lmn
9 opq
if we type:
join 1.txt 2.txtWe get
1 abc abc
3 pqr lmnsince the two files are trying to match on the 1st column, and both have a 1 and a 3 in that column. To tell join to use the second column to join with, we type:
join -1 2 -2 2 1.txt 2.txt (where "-1 2" stands for the 2nd filed of the 1st file, and "-2 2" stands for the 2nd field of the 2nd file) we get:
abc 1 1
lmn 2 3which are just those fields that match in both files. This is refered to as an inner join. Inner joins look for rows that match rows in the other table. The problem with inner joins is that only rows that match between tables are returned.
If we type:
join -a1 -1 2 -2 2 1.txt 2.txt (where "-a1" says include all the records from the first file) we get:
abc 1 1
lmn 2 3
pqr 3this is missing a number for pqr for the second file (since there is no pqr in that file) and is missing "9 opq" from the second file. This is an example of a left outer join, called such because it includes all of the rows from the left (or first) file specified.
If we type:
join -a1 -a2 -1 2 -2 2 1.txt 2.txt (adding "-a2" to tell join to also include all records from the second file) we get:
abc 1 1
lmn 2 3
opq 9
pqr 3Which has all of the records. This is an example of a full outer join since it has all of the rows from both files. (Missing from these examples is the case where we had the -a2 without the -a1. That would have produced a right outer join which contained all of the records from the second (right) file and only those rows from the first (left) file that matched).
While this example has all of the rows from both files, we still have a problem since we can not tell which file the count is for on the last two records.
We now need to format the output using the "-o" command. The options for "-o" are as follows:
A "0" (that.s a zero) means display the join field
A number in the format of X.Y means to display the Y field from the X file (ex 2.1 means display the first field from the second file).
-e "0" says to replace any missing data fields with whatever is in-between the quotes (in this case a zero).
So if we type:
join -a1 -a2 -1 2 -2 2 -o 0 1.1 2.1 -e "0" 1.txt 2.txtwe get:
abc 1 1
lmn 2 3
opq 0 9
pqr 3 0which has all of the records, and fills in the zeros for us.
No comments:
Post a Comment