Linux Essentials חיפוש וחילוץ של נתונים מקבצים

שיעור 13 of 22

בתהליך

← Previous

חיפוש וחילוץ של נתונים מקבצים

אלכס טימוחוב

הפניית קלט/פלט (I/O Redirection)

הפניית קלט/פלט מאפשרת למשתמש להפנות מידע מפקודה או לפקודה באמצעות קובץ טקסט. כפי שתואר קודם לכן, ניתן להפנות את הקלט הסטנדרטי, הפלט הסטנדרטי ואת הפלט של הודעות השגיאה, ולקבל את המידע מקבצי טקסט.

הפניית פלט סטנדרטי

כדי להפנות את הפלט הסטנדרטי לקובץ, במקום להציג אותו על המסך, יש להשתמש באופרטור > ואחריו שם הקובץ. אם הקובץ לא קיים, קובץ חדש ייווצר, ואם הוא קיים, המידע ידרוס את תוכנו הנוכחי.

כדי לראות את תוכן הקובץ שזה עתה יצרנו, ניתן להשתמש בפקודת cat, שמציגה את תוכן הקובץ על המסך. הדוגמה הבאה מדגימה את השימוש באופרטור:

$ echo "Hello World!" > text
$ cat text
Hello World!

בדוגמה השנייה, אותו קובץ נדרס עם טקסט חדש:

$ echo "Hello!" > text
$ cat text
Hello!

אם רוצים להוסיף מידע חדש בסוף הקובץ, יש להשתמש באופרטור >>. אופרטור זה גם ייצור קובץ חדש אם לא יימצא קובץ קיים. הדוגמה הבאה מדגימה את הוספת הטקסט:

$ echo "Hello to you too!" >> text
$ cat text
Hello!
Hello to you too!

כמו כן, ניתן לראות שפקודה זו תיצור קובץ חדש אם לא יימצא קובץ קיים:

$ echo "Hello to you too!" >> text2
$ cat text2
Hello to you too!

הפניית פלט שגיאות סטנדרטי

כדי להפנות רק את הודעות השגיאה, יש להשתמש באופרטור 2> ואחריו שם הקובץ שבו יכתבו השגיאות. אם הקובץ לא קיים, ייווצר קובץ חדש, אחרת הקובץ ידרס.

כפי שהוסבר, הערוץ להפניית פלט השגיאה הסטנדרטי הוא ערוץ 2. לדוגמה, הפקודה הבאה תחפש קובץ או תיקיה בשם games ותכתוב את השגיאה לקובץ text-error, בעוד הפלט הסטנדרטי יוצג על המסך:

$ find /usr games 2> text-error
/usr
/usr/share
/usr/share/misc
---------Omitted output----------
/usr/games
$ cat text-error
find: `games': No such file or directory

דוגמה נוספת שבה אין שגיאות, ולכן לא ייכתב דבר לקובץ text-error:

$ sort /etc/passwd 2> text-error
$ cat text-error

בדומה לפלט הסטנדרטי, ניתן גם לצרף פלט שגיאות לקובץ קיים באמצעות האופרטור 2>>. אם הקובץ לא קיים, ייווצר קובץ חדש. הדוגמה הבאה מראה את הוספת השגיאות לקובץ:

$ sort /etc 2>> text-error
$ cat text-error
sort: read failed: /etc: Is a directory

$ sort /etc/shadow 2>> text-error2
$ cat text-error2
sort: open failed: /etc/shadow: Permission denied

באמצעות הפניית שגיאות זו, רק הודעות השגיאה ינותבו לקובץ, בעוד הפלט הסטנדרטי יוצג על המסך.

ישנו קובץ מיוחד במערכת שנקרא /dev/null, והוא משמש כמעין “בולען” שאוסף מידע אך לא עושה איתו דבר. ניתן להפנות מידע לא רצוי לקובץ זה כדי לא לשמור אותו בקובץ חשוב:

$ sort /etc 2> /dev/null

הפניית פלט שגיאות סטנדרטי

כמו שניתן להפנות את הפלט הסטנדרטי, אפשר גם לצרף את השגיאות לקובץ באמצעות האופרטור 2>>. זה יוסיף את הודעות השגיאה בסוף הקובץ. אם הקובץ לא קיים, ייווצר קובץ חדש. הדוגמה הבאה מדגימה הוספת מידע חדש לקובץ, בעוד שהדוגמה השנייה מראה יצירת קובץ חדש כשלא נמצא קובץ באותו שם:

$ sort /etc 2>> text-error
$ cat text-error
sort: read failed: /etc: Is a directory
$ sort /etc/shadow 2>> text-error2
$ cat text-error2
sort: open failed: /etc/shadow: Permission denied

באמצעות הפניה כזו, רק הודעות השגיאה ינותבו לקובץ, בעוד הפלט הרגיל יוצג על המסך או ינותב דרך הפלט הסטנדרטי.

ישנו קובץ מיוחד שנקרא /dev/null, שמכונה “ביט באקט” (bit bucket). קובץ זה מקבל קלט אך לא עושה איתו דבר. ניתן להפנות מידע לא רלוונטי אליו כדי שלא יוצג או ייכתב בקובץ חשוב:

$ sort /etc 2> /dev/null

הפניית קלט סטנדרטי

הפניה מסוג זה משמשת להזנת נתונים לפקודה מתוך קובץ מסוים במקום מהמקלדת. במקרה זה משתמשים באופרטור <, כפי שנראה בדוגמה:

$ cat < text
Hello!
Hello to you too!

הפניית קלט סטנדרטי משמשת לרוב עם פקודות שלא מקבלות קבצים כארגומנט. פקודת tr היא אחת מהן. פקודה זו משמשת לתרגום תכנים בקובץ על ידי שינוי תווים, כמו מחיקת תו מסוים. הדוגמה הבאה מדגימה מחיקת התו "l":

$ tr -d "l" < text
Heo!
Heo to you too!

למידע נוסף, עיין בדף ה-man של פקודת tr.

מסמכי Here

בניגוד להפניות הפלט, האופרטור << מתנהג בצורה שונה. זרם קלט זה מכונה גם מסמך Here (here document). הוא מייצג בלוק קוד או טקסט שניתן להפנות לפקודה או לתוכנית אינטראקטיבית. שפות תסריטים שונות, כגון bash, sh ו-csh יכולות לקבל קלט ישירות משורת הפקודה מבלי להשתמש בקובצי טקסט.

כפי שניתן לראות בדוגמה הבאה, האופרטור משמש להזנת נתונים לפקודה, בעוד שהמילה אחרי אינה מייצגת שם קובץ, אלא משמשת כתחום של הקלט, ולא תיחשב כחלק מהתוכן. לכן, cat לא תציג אותה:

$ cat << hello
> hey
> ola
> hello
hey
ola

למידע נוסף, עיין בדף ה-man של פקודת cat.

שילובים

השילוב הראשון שאנו נחקור משלב את הפניית הפלט הסטנדרטי ואת הפלט של שגיאות סטנדרטיות לאותו קובץ. האופרטורים &> ו-&>> משמשים לכך, כאשר & מייצג שילוב של ערוץ 1 (פלט) וערוץ 2 (שגיאות). האופרטור הראשון ידרוס את תוכן הקובץ הקיים והשני יוסיף את המידע החדש בסוף הקובץ. שניהם יאפשרו יצירת קובץ חדש אם לא קיים:

$ find /usr admin &> newfile
$ cat newfile
/usr
/usr/share
/usr/share/misc
---------Omitted output----------
/usr/games
find: `admin': No such file or directory

$ find /etc/calendar &>> newfile
$ cat newfile
/usr
/usr/share
/usr/share/misc
---------Omitted output----------
/usr/games
find: `admin': No such file or directory
/etc/calendar
/etc/calendar/default

דוגמה לפקודת cut

נבחן דוגמה עם פקודת cut:

$ cut -f 3 -d "/" newfile
$ cat newfile
share
share
---------Omitted output----------
lib
games
find: `admin': No such file or directory
calendar
calendar

פקודת cut חותכת שדות מסוימים מהקלט באמצעות האופציה -f, במקרה זה השדה השלישי. כדי שהפקודה תוכל למצוא את השדה, יש לציין גם תו מפריד עם האופציה -d, כאשר במקרה שלנו תו המפריד הוא /.

למידע נוסף על פקודת cut, עיין בדף ה-man שלה.

צינורות בשורת הפקודה

הפניית פלט מיועדת בעיקר לאחסון התוצאה של פקודה על מנת לעבד אותה עם פקודה אחרת. תהליך ביניים כזה יכול להיות מסורבל ומסובך כאשר רוצים שהנתונים יעברו דרך מספר תהליכים. כדי להימנע מכך, ניתן לחבר פקודות ישירות באמצעות צינורות (pipes). כלומר, הפלט של הפקודה הראשונה הופך אוטומטית לקלט של הפקודה השנייה. החיבור נעשה באמצעות האופרטור | (קו אנכי):

$ cat /etc/passwd | less
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh

בדוגמה זו, הפקודה less שמגיעה אחרי אופרטור הצינור משנה את אופן התצוגה של הקובץ. הפקודה less מציגה את הקובץ באופן שמאפשר למשתמש לגלול מעלה ומטה שורה בכל פעם. הפקודה less משמשת גם להצגת דפי man, כפי שהוזכר בשיעורים קודמים.

אפשר להשתמש במספר צינורות בו זמנית. הפקודות הביניים שמקבלות קלט, משנות אותו ומפיקות פלט, נקראות מסננים (filters). ניקח את הפקודה ls -l וננסה לספור את מספר המילים מתוך 10 השורות הראשונות של הפלט. לשם כך נשתמש בפקודה head שמציגה כברירת מחדל את 10 השורות הראשונות של קובץ, ולאחר מכן נספור את המילים עם הפקודה wc:

$ ls -l | head | wc -w
10

כפי שהוזכר, כברירת מחדל, head מציגה רק את 10 השורות הראשונות של קובץ הטקסט. התנהגות זו ניתנת לשינוי באמצעות אפשרויות ספציפיות. עיין בדף ה-man של הפקודה לקבלת פרטים נוספים.

ישנה פקודה נוספת שמציגה את סוף הקובץ: tail. כברירת מחדל, פקודה זו מציגה את 10 השורות האחרונות של הקובץ, אך כמו בפקודה head, גם כאן ניתן לשנות את מספר השורות המוצג. בדוק את דף ה-man של tail לקבלת פרטים נוספים.

הערה: האופציה -f יכולה להציג את השורות האחרונות של קובץ בזמן שהקובץ מתעדכן. תכונה זו יכולה להיות שימושית כאשר מנטרים קובץ כמו syslog לפעילות שוטפת.

פקודת wc

הפקודה wc (Word Count) סופרת כברירת מחדל את השורות, המילים והבתים של קובץ. בדוגמה שהוצגה, האפשרות -w מורה לפקודה לספור רק את המילים בשורות שנבחרו. האפשרויות הנפוצות ביותר שבהן ניתן להשתמש עם פקודה זו הן: -l, שמורה לפקודה לספור רק את השורות, ו--c, שמורה לספור רק את הבתים. עיין בדף ה-man של הפקודה wc לקבלת פרטים נוספים.

חיפוש בקבצים באמצעות הפקודה `grep`

הכלי הראשון שנדון בו בשיעור זה הוא הפקודה grep. המונח grep הוא קיצור של הביטוי "global regular expression print", והפונקציה העיקרית שלו היא חיפוש בקבצים לפי תבנית שנקבעה. הפקודה מוציאה את השורה המכילה את התבנית המבוקשת כשהיא מודגשת באדום.

$ grep bash /etc/passwd
root:x:0:0:root:/root:/bin/bash
user:x:1001:1001:User,,,,:/home/user:/bin/bash

כמו רוב הפקודות, גם grep ניתנת להתאמה באמצעות אפשרויות נוספות. להלן האפשרויות הנפוצות:

-i: החיפוש מתבצע ללא תלות ברישיות (case insensitive).
-r: החיפוש מתבצע באופן רקורסיבי (בתוך כל הקבצים שבספריה הנתונה ותיקיות המשנה שלה).
-c: החיפוש סופר את מספר ההתאמות שנמצאו.
-v: הפוך את ההתאמה, כלומר הצג שורות שלא מתאימות לתבנית החיפוש.
-E: מפעיל ביטויים רגולריים מורחבים (נדרש לשימוש בחלק מהמטא-תווים המתקדמים כמו |, + ו-?).

לפקודת grep יש אפשרויות רבות נוספות ושימושיות. עיין בדף ה-man שלה כדי ללמוד עוד.

ביטויים רגולריים

הכלי השני הוא עוצמתי מאוד, והוא משמש לתיאור רצפים של טקסט בתוך קבצים, המכונים גם ביטויים רגולריים. ביטויים רגולריים הם כלי שימושי מאוד לחילוץ נתונים מקבצי טקסט באמצעות בניית תבניות. הם נפוצים בשימוש בסקריפטים או בתכנות בשפות ברמה גבוהה כמו Perl או Python.

כשעובדים עם ביטויים רגולריים, חשוב לזכור שכל תו נחשב, והתבנית נכתבת כדי להתאים לרצף מסוים של תווים, המכונה מחרוזת (string). רוב התבניות משתמשות בסימנים רגילים כמו אותיות, ספרות, סימני פיסוק או סימנים אחרים, אך ניתן להשתמש גם בתווי Unicode כדי להתאים לכל סוג אחר של טקסט.

להלן רשימת מטא-תווים המשמשים ליצירת תבניות בביטויים רגולריים:

. - מתאים לכל תו בודד (למעט תו newline).
[abcABC] - מתאים לכל תו אחד מתוך הסוגריים המרובעים.
[^abcABC] - מתאים לכל תו אחד מלבד אלו שבסוגריים המרובעים.
[a-z] - מתאים לכל תו בטווח המוגדר.
[^a-z] - מתאים לכל תו מלבד אלו שבטווח המוגדר.
sun|moon - מוצא את אחד מהמחרוזות המצוינות.
^ - תחילת שורה.
$ - סוף שורה.

ניתן להשתמש בכל הפונקציות של הביטויים הרגולריים גם עם הפקודה grep. חשוב לשים לב שהמילה אינה מוקפת במרכאות כפולות, ולכן כדי למנוע מה-shell לפרש את המטא-תווים, מומלץ לשים את התבנית המורכבת במרכאות כפולות (" "). לצורכי תרגול, נשתמש במרכאות כפולות כאשר ניישם ביטויים רגולריים.

דוגמאות

הדוגמאות הבאות מדגימות את השימוש בביטויים רגולריים. תחילה נוסיף נתונים לקובץ text.txt:

$ echo "aaabbb1" > text.txt
$ echo "abab2" >> text.txt
$ echo "noone2" >> text.txt
$ echo "class1" >> text.txt
$ echo "alien2" >> text.txt
$ cat text.txt
aaabbb1
abab2
noone2
class1
alien2

הדוגמה הראשונה היא שילוב של חיפוש בקובץ ללא ביטויים רגולריים ועם ביטויים רגולריים. חשוב להראות את ההבדל. הפקודה הראשונה מחפשת את המחרוזת המדויקת, בכל מקום בשורה, בעוד שהפקודה השנייה מחפשת קבוצות של תווים המכילים כל אחד מהתווים שבסוגריים. לכן, תוצאות הפקודות שונות.

$ grep "ab" text.txt
aaabbb1
abab2
$ grep "[ab]" text.txt
aaabbb1
abab2
class1
alien2

סט הדוגמאות השני מראה את השימוש במטא-תווים לתחילת וסוף שורה. חשוב לשים את המטא-תווים במיקום הנכון בביטוי. כאשר מציינים תחילת שורה, המטא-תו צריך להיות לפני הביטוי, בעוד שכשמציינים סוף שורה, המטא-תו צריך להיות אחרי הביטוי.

מטא-תווים להרחבת תבניות חיפוש עם `grep`

כמו כן, קיימים מטא-תווים בביטויים רגולריים שמאפשרים להכפיל את התבנית שנקבעה קודם לכן:

* - אפס או יותר מהתבנית הקודמת.
+ - אחד או יותר מהתבנית הקודמת.
? - אפס או אחד מהתבנית הקודמת.

להלן דוגמה לשימוש במטא-תווים אלו. הפקודה הבאה מחפשת מחרוזת המכילה את התבנית ab, תו אחד כלשהו, ואחד או יותר מהתווים שנמצאו. התוצאה מראה כי grep מצא את המחרוזת aaabbb1 ואת abab2, תוך התאמה לחלק abbb כמו גם abab2. מכיוון שהתווים + ו-* הם חלק מהביטויים הרגולריים המורחבים, יש להשתמש באפשרות -E עם הפקודה grep:

$ grep -E "ab.+" text.txt
aaabbb1
abab2

רוב המטא-תווים מוסברים מעצמם, אך הם יכולים להיות מסובכים בשימוש ראשון. הדוגמאות הקודמות מדגישות חלק קטן מיכולות הביטויים הרגולריים. נסה את כל המטא-תווים בטבלה למעלה כדי להבין יותר כיצד הם פועלים.

תרגול מודרך

באמצעות הפקודה grep והקובץ /usr/share/hunspell/en_US.dic, מצא את השורות שעונות על הקריטריונים הבאים:

כל השורות המכילות את המילה cat בכל מקום בשורה.
כל השורות שאינן מכילות אף אחד מהתווים הבאים: sawgtfixk.
כל השורות שמתחילות ב-3 אותיות כלשהן ולאחריהן המילה dig.
כל השורות שמסתיימות באות אחת לפחות e.
כל השורות המכילות אחת מהמילים הבאות: org, kay, או tuna.
מספר השורות שמתחילות באות c אחת או לא, ולאחריה המחרוזת ati.

תרגול נוסף

מצא את הביטוי הרגולרי שמאתר את המילים בקו "כולל" ולא תואם את המילים בקו "לא כולל".
איזו פקודה נוספת יכולה לשמש לחיפוש בתוך קבצים? אילו פונקציות נוספות היא מספקת?
חזור על השיעור הקודם ונסה למצוא תבנית מסוימת בתוך הפלט של הפקודה בעזרת grep.

סיכום

בשיעור זה למדת על:

מטא-תווים בביטויים רגולריים
כיצד ליצור תבניות עם ביטויים רגולריים
כיצד לחפש בתוך קבצים

פקודות שנלמדו בשיעור:

grep: מחפש תווים או מחרוזות בתוך קובץ.

מטא-תווים להרחבת תבניות חיפוש עם `grep`

כמו כן, קיימים מטא-תווים בביטויים רגולריים שמאפשרים להכפיל את התבנית שנקבעה קודם לכן:

* - אפס או יותר מהתבנית הקודמת.
+ - אחד או יותר מהתבנית הקודמת.
? - אפס או אחד מהתבנית הקודמת.

$ grep -E "ab.+" text.txt
aaabbb1
abab2

תרגול מודרך

באמצעות הפקודה grep והקובץ /usr/share/hunspell/en_US.dic, מצא את השורות שעונות על הקריטריונים הבאים:

כל השורות המכילות את המילה cat בכל מקום בשורה.
כל השורות שאינן מכילות אף אחד מהתווים הבאים: sawgtfixk.
כל השורות שמתחילות ב-3 אותיות כלשהן ולאחריהן המילה dig.
כל השורות שמסתיימות באות אחת לפחות e.
כל השורות המכילות אחת מהמילים הבאות: org, kay, או tuna.
מספר השורות שמתחילות באות c אחת או לא, ולאחריה המחרוזת ati.

תרגול נוסף

מצא את הביטוי הרגולרי שמאתר את המילים בקו "כולל" ולא תואם את המילים בקו "לא כולל".
איזו פקודה נוספת יכולה לשמש לחיפוש בתוך קבצים? אילו פונקציות נוספות היא מספקת?
חזור על השיעור הקודם ונסה למצוא תבנית מסוימת בתוך הפלט של הפקודה בעזרת grep.

סיכום

בשיעור זה למדת על:

מטא-תווים בביטויים רגולריים
כיצד ליצור תבניות עם ביטויים רגולריים
כיצד לחפש בתוך קבצים

פקודות שנלמדו בשיעור:

grep: מחפש תווים או מחרוזות בתוך קובץ.

תשובות לתרגול מודרך

כל השורות המכילות את המילה cat בכל מקום בשורה:

$ grep "cat" /usr/share/hunspell/en_US.dic
Alcatraz/M
Decatur/M
Hecate/M

כל השורות שאינן מכילות אף אחד מהתווים הבאים: sawgtfixk:

$ grep -v "[sawgtfixk]" /usr/share/hunspell/en_US.dic
49269
0/nm
...

כל השורות שמתחילות ב-3 אותיות ולאחריהן המילה dig:

$ grep "^...dig" /usr/share/hunspell/en_US.dic
cardigan/SM
condign
...

כל השורות שמסתיימות באות אחת לפחות e:

$ grep -E "e+$" /usr/share/hunspell/en_US.dic
Anglicize
Anthropocene
...

כל השורות שמכילות אחת מהמילים: org, kay, tuna:

$ grep -E "org|kay|tuna" /usr/share/hunspell/en_US.dic
Borg/SM
George/MS
Tokay/M
...

מספר השורות שמתחילות באות c אחת או לא, ואחריה המחרוזת ati:

$ grep -cE "^c?ati" /usr/share/hunspell/en_US.dic
3

תשובות לתרגול נוסף

מצא את הביטוי הרגולרי שמאתר את המילים בקו "כולל" ולא תואם את המילים בקו "לא כולל".

◦ Include: pot, spot, apot | Exclude: potic, spots, potatoe | Answer: pot$
◦ Include: arp99, apple, zipper | Exclude: zoo, arive, attack | Answer: p+
◦ Include: arcane, capper, zoology | Exclude: air, coper, zoloc | Answer: arc|cap|zoo
◦ Include: 0th/pt, 3th/tc, 9th/pt | Exclude: 0/nm, 3/nm, 9/nm | Answer: [0-9]th.+
◦ Include: Hawaii, Dario, Ramiro | Exclude: hawaii, Ian, Alice | Answer: ^[A-Z]a.*i+

פקודת sed משמשת גם היא לחיפוש בתוך קבצים. היא מספקת גם אפשרות למצוא ולהחליף תווים בקובץ.
השתמשתי בפקודה cat עם grep כדי לחפש תבנית מסוימת. לדוגמה:

$ cat contents.txt | tr -s " " | grep "^....rwx"

I/O Redirection
I/O redirection enables the user to redirect information from or to a command by using a text file. As
described earlier, the standard input, output and error output can be redirected, and the information
can be taken from text files.
Linux Essentials (Version 1.6) | 3.2 Searching and Extracting Data from Files
Version: 2022-04-29 | Licensed for Cyber School. | learning.lpi.org | 185
Redirecting Standard Output
To redirect standard output to a file, instead of the screen, we need to use the > operator followed by
the name of the file. If the file doesn’t exist, a new one will be created, otherwise, the information
will overwrite the existing file.
In order to see the contents of the file that we just created, we can use the cat command. By default,
this command displays the contents of a file on the screen. Consult the manual page to find out more
about its functionalities.
The example below demonstrates the functionality of the operator. In the first instance, a new file is
created containing the text “Hello World!”:
$ echo "Hello World!" > text
$ cat text
Hello World!
In the second invocation, the same file is overwritten with the new text:
$ echo "Hello!" > text
$ cat text
Hello!
If we want to add new information at the end of the file, we need to use the >> operator. This
operator also creates a new file if it cannot find an existing one.
The first example shows the addition of the text. As it can be seen, the new text was added on the
following line:
$ echo "Hello to you too!" >> text
$ cat text
Hello!
Hello to you too!
The second example demonstrates that a new file will be created:
$ echo "Hello to you too!" >> text2
$ cat text2
Hello to you too!

Redirecting Standard Error
In order to redirect just the error messages, a user will need to employ the 2> operator followed by
the name of the file in which the errors will be written. If the file doesn’t exist, a new one will be
created, otherwise the file will be overwritten.
As explained, the channel for redirecting the standard error is channel 2. When redirecting the
standard error, the channel must be specified, contrary to the other standard output where channel 1
is set by default. For example, the following command searches for a file or directory named games
and only writes the error into the text-error file, while displaying the standard output on the
screen:
$ find /usr games 2> text-error
/usr
/usr/share
/usr/share/misc
---------Omitted output----------
/usr/lib/libmagic.so.1.0.0
/usr/lib/libdns.so.81
/usr/games
$ cat text-error
find: `games': No such file or directory
NOTE For more information about the find command, consult its man page.
For example, the following command will run without errors, therefore no information will be
written in the file text-error:
$ sort /etc/passwd 2> text-error
$ cat text-error
As well as the standard output, the standard error can also be appended to a file with the 2>>
operator. This will add the new error at the end of the file. If the file doesn’t exist, a new one will be
created. The first example shows the addition of the new information into the file, whereas the
second example shows that the command creates a new file where an existing one can’t be found
with the same name:

$ sort /etc 2>> text-error
$ cat text-error
sort: read failed: /etc: Is a directory
$ sort /etc/shadow 2>> text-error2
$ cat text-error2
sort: open failed: /etc/shadow: Permission denied
Using this type of redirection, only the error messages will be redirected to the file, the normal
output will be written on the screen or go through standard output or stdout.
There is one particular file that technically is a bit bucket (a file that accepts input and doesn’t do
anything with it): /dev/null. You can redirect any irrelevant information that you might not want
displayed or redirected into an important file, as shown in the example below:
$ sort /etc 2> /dev/null

As well as the standard output, the standard error can also be appended to a file with the 2>>
operator. This will add the new error at the end of the file. If the file doesn’t exist, a new one will be
created. The first example shows the addition of the new information into the file, whereas the
second example shows that the command creates a new file where an existing one can’t be found
with the same name:

$ sort /etc 2>> text-error
$ cat text-error
sort: read failed: /etc: Is a directory
$ sort /etc/shadow 2>> text-error2
$ cat text-error2
sort: open failed: /etc/shadow: Permission denied
Using this type of redirection, only the error messages will be redirected to the file, the normal
output will be written on the screen or go through standard output or stdout.
There is one particular file that technically is a bit bucket (a file that accepts input and doesn’t do
anything with it): /dev/null. You can redirect any irrelevant information that you might not want
displayed or redirected into an important file, as shown in the example below:
$ sort /etc 2> /dev/null
Redirecting Standard Input
This type of redirection is used to input data to a command, from a specified file instead of a
keyboard. In this case the < operator is used as shown in the example:
$ cat < text
Hello!
Hello to you too!
Redirecting standard input is usually used with commands that don’t accept file arguments. The tr
command is one of them. This command can be used to translate file contents by modifying the
characters in a file in specific ways, like deleting any particular character from a file, the example
below shows the deletion of the character l:
$ tr -d "l" < text
Heo!
Heo to you too!
For more information, consult the man page of tr.

Here Documents
Unlike the output redirections, the << operator acts in a different way compared to the other
operators. This input stream is also called here document. Here document represents the block of code
or text which can be redirected to the command or the interactive program. Different types of
scripting languages, like bash, sh and csh are able to take input directly from the command line,
without using any text files.
As can be seen in the example below, the operator is used to input data into the command, while the
word after doesn’t specify the file name. The word is interpreted as the delimiter of the input and it
will not be taken in consideration as content, therefore cat will not display it:
$ cat << hello
> hey
> ola
> hello
hey
ola
Consult the man page of the cat command to find more information.
Combinations
The first combination that we will explore combines the redirection of the standard output and
standard error output to the same file. The &> and &>> operators are used, & representing the
combination of channel 1 and channel 2. The first operator will overwrite the existing contents of the
file and the second one will append or add the new information at the end of the file. Both operators
will enable the creation of the new file if it doesn’t exist, just like in the previous sections:
Linux Essentials (Version 1.6) | 3.2 Searching and Extracting Data from Files
Version: 2022-04-29 | Licensed for Cyber School. | learning.lpi.org | 189
$ find /usr admin &> newfile
$ cat newfile
/usr
/usr/share
/usr/share/misc
---------Omitted output----------
/usr/lib/libmagic.so.1.0.0
/usr/lib/libdns.so.81
/usr/games
find: `admin': No such file or directory
$ find /etc/calendar &>> newfile
$ cat newfile
/usr
/usr/share
/usr/share/misc
---------Omitted output----------
/usr/lib/libmagic.so.1.0.0
/usr/lib/libdns.so.81
/usr/games
find: `admin': No such file or directory
/etc/calendar
/etc/calendar/default
Let’s take a look at an example using the cut command:
$ cut -f 3 -d "/" newfile
$ cat newfile
share
share
share
---------Omitted output----------
lib
games
find: `admin': No such file or directory
calendar
calendar
find: `admin': No such file or directory
The cut command cuts specified fields from the input file by using the -f option, the 3rd field in our
case. In order for the command to find the field, a delimiter needs to be specified as well with the -d
option. In our case the delimiter will be the / character.
Linux Essentials (Version 1.6) | Topic 3: The Power of the Command Line
190 | learning.lpi.org | Licensed for Cyber School. | Version: 2022-04-29
To find more about the cut command, consult its man page.

Command Line Pipes
Redirection is mostly used to store the result of a command, to be processed by a different command.
This type of intermediate process can become very tedious and complicated if you want the data to
go through multiple processes. In order to avoid this, you can link the command directly via pipes. In
other words, the first command’s output automatically becomes the second command’s input. This
connection is made by using the | (vertical bar) operator:
$ cat /etc/passwd | less
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
:
In the example above, the less command after the pipe operator modifies the way that the file is
displayed. The less command displays the text file allowing the user to scroll up and down a line at
the time. less is also used by default to display the man pages, as discussed in the previous lessons.
It is possible to use multiple pipes at the same time. The intermediate commands that receive input
then change it and produce output are called filters. Let’s take the ls -l command and try to count
the number of words from the first 10 lines of the output. In order to do this, we will have to use the
head command that by default displays the first 10 lines of a file and then count the words using the
wc command:
$ ls -l | head | wc -w
10
As mentioned earlier, by default, head only displays the first 10 lines of the text file specified. This
behaviour can be modified by using specific options. Check the command’s man page to find more.
There is another command that displays the end of a file: tail. By default, this command selects the
last 10 lines and displays them, but as head the number can also be modified. Check tail’s man page
for more details.
NOTE The -f option can show the last lines of a file while it’s being updated. This feature
can become very useful when monitoring a file like syslog for ongoing activity.
Linux Essentials (Version 1.6) | 3.2 Searching and Extracting Data from Files
Version: 2022-04-29 | Licensed for Cyber School. | learning.lpi.org | 191
The wc (word count) command counts by default the lines, words and bytes of a file. As shown in the
exercise, the -w option causes the command to only count the words within the selected lines. The
most common options that you can use with this command are: -l, which specifies to the command
to only count the lines, and -c, which is used to count only the bytes. More variations and options of
the command, as well as more information about wc can be found within the command’s man page.
Linux Essentials (Version 1.6) | Topic 3: The Power of the Command Line
192 | learning.lpi.org | Licensed for Cyber School. | Version: 2022-04-29
Guided Exercises
1. List the contents of your current directory, including the ownership and permissions, and
redirect the output to a file called contents.txt within your home directory.
2. Sort the contents of the contents.txt file from your current directory and append it to the end
of a new file named contents-sorted.txt.
3. Display the last 10 lines of the /etc/passwd file and redirect it to a new file in your user’s
Documents directory.
4. Count the number of words within the contents.txt file and append the output to the end of a
file field2.txt in your home directory. You will need to use both input and output redirection.
5. Display the first 5 lines of the /etc/passwd file and sort the output reverse alphabetically.
6. Using the previously created contents.txt file, count the number of characters of the last 9
lines.
7. Count the number of files called test within the /usr/share directory and its subdirectories.
Note: each line output from the find command represents a file.
Linux Essentials (Version 1.6) | 3.2 Searching and Extracting Data from Files
Version: 2022-04-29 | Licensed for Cyber School. | learning.lpi.org | 193
Explorational Exercises
1. Select the second field of the contents.txt file and redirect the standard output and error output
to another file called field1.txt.
2. Using the input redirection operator and the tr command, delete the dashes (-) from the
contents.txt file.
3. What is the biggest advantage of only redirecting errors to a file?
4. Replace all recurrent spaces within the alphabetically sorted contents.txt file with a single
space.
5. In one command line, eliminate the recurrent spaces (as done in the previous exercise), select the
ninth field and sort it reverse alphabetically and non-case sensitive. How many pipes did you
have to use?
Linux Essentials (Version 1.6) | Topic 3: The Power of the Command Line
194 | learning.lpi.org | Licensed for Cyber School. | Version: 2022-04-29
Summary
In this lab you learned:
• Types of redirection
• How to use the redirection operators
• How to use pipes to filter command output
Commands used in this lesson:
cut
Removes sections from each line of a file.
cat
Displays or concatenates files.
find
Searches for files in a directory hierarchy.
less
Displays a file, allowing the user to scroll one line at the time.
more
Displays a file, a page at the time.
head
Displays the first 10 lines of a file.
tail
Displays the last 10 lines of a file.
sort
Sorts files.
wc
Counts by default the lines, words or bytes of a file.
Linux Essentials (Version 1.6) | 3.2 Searching and Extracting Data from Files
Version: 2022-04-29 | Licensed for Cyber School. | learning.lpi.org | 195
Answers to Guided Exercises
1. List the contents of your current directory, including the ownership and permissions, and
redirect the output to a file called contents.txt within your home directory.
$ ls -l > contents.txt
2. Sort the contents of the contents.txt file from your current directory and append it to the end
of a new file named contents-sorted.txt.
$ sort contents.txt >> contents-sorted.txt
3. Display the last 10 lines of the /etc/passwd file and redirect it to a new file in the your user’s
Documents directory.
$ tail /etc/passwd > Documents/newfile
4. Count the number of words within the contents.txt file and append the output to the end of a
file field2.txt in your home directory. You will need to use both input and output redirection.
$ wc < contents.txt >> field2.txt
5. Display the first 5 lines of the /etc/passwd file and sort the output reverse alphabetically.
$ head -n 5 /etc/passwd | sort -r
6. Using the previously created contents.txt file, count the number of characters of the last 9
lines.
$ tail -n 9 contents.txt | wc -c
531
7. Count the number of files called test within the /usr/share directory and its subdirectories.
Note: each line output from the find command represents a file.
Linux Essentials (Version 1.6) | Topic 3: The Power of the Command Line
196 | learning.lpi.org | Licensed for Cyber School. | Version: 2022-04-29
$ find /usr/share -name test | wc -l
125

Answers to Explorational Exercises
1. Select the second field of the contents.txt file and redirect the standard output and error output
to another file called field1.txt.
$ cut -f 2 -d " " contents.txt &> field1.txt
2. Using the input redirection operand and the tr command, delete the dashes (-) from the
contents.txt file.
$ tr -d "-" < contents.txt
3. What is the biggest advantage of only redirecting errors to a file?
Only redirecting errors to a file can help with keeping a log file that is monitored frequently.
4. Replace all recurrent spaces within the alphabetically sorted contents.txt file with a single
space.
$ sort contents.txt | tr -s " "
5. In one command line, eliminate the recurrent spaces (as done in the previous exercise), select the
ninth field and sort it reverse alphabetically and non-case sensitive. How many pipes did you
have to use?
$ cat contents.txt | tr -s " " | cut -f 9 -d " " | sort -fr
The exercise uses 3 pipes, one for each filter.

Searching within Files with grep
The first tool that we will discuss in this lesson is the grep command. grep is the abbreviation of the
phrase “global regular expression print” and its main functionality is to search within files for the
specified pattern. The command outputs the line containing the specified pattern highlighted in red.
$ grep bash /etc/passwd
root:x:0:0:root:/root:/bin/bash
user:x:1001:1001:User,,,,:/home/user:/bin/bash
grep, as most commands, can also be tweaked by using options. Here are the most common ones:
Linux Essentials (Version 1.6) | 3.2 Searching and Extracting Data from Files
Version: 2022-04-29 | Licensed for Cyber School. | learning.lpi.org | 199
-i
the search is case insensitive
-r
the search is recursive (it searches into all files within the given directory and its subdirectories)
-c
the search counts the number of matches
-v
invert the match, to print lines that do not match the search term
-E
turns on extended regular expressions (needed by some of the more advanced meta-characters
like | , + and ?)
grep has many other useful options. Consult the man page to find out more about it.
Regular Expressions
The second tool is very powerful. It is used to describe bits of text within files, also called regular
expressions. Regular expressions are extremely useful in extracting data from text files by
constructing patterns. They are commonly used within scripts or when programming with high
level languages, such as Perl or Python.
When working with regular expressions, it is very important to keep in mind that every character
counts and the pattern is written with the purpose of matching a specific sequence of characters,
known as a string. Most patterns use the normal ASCII symbols, such as letters, digits, punctuation
or other symbols, but it can also use Unicode characters in order to match any other type of text.
The following list explains the regular expressions meta-characters that are used to form the
patterns.
.
Match any single character (except newline)
[abcABC]
Match any one character within the brackets
Linux Essentials (Version 1.6) | Topic 3: The Power of the Command Line
200 | learning.lpi.org | Licensed for Cyber School. | Version: 2022-04-29
[^abcABC]
Match any one character except the ones in the brackets
[a-z]
Match any character in the range
[^a-z]
Match any character except the ones in the range
sun|moon
Find either of the listed strings
^
Start of a line
$
End of a line
All functionalities of the regular expressions can be implemented through grep as well. You can see
that in the example above, the word is not surrounded by double quotes. To prevent the shell from
interpreting the meta-character itself, it is recommended that the more complex pattern be between
double quotes (" "). For the purpose of practice, we will be using double quotes when implementing
regular expressions. The other quotation marks keep their normal functionality, as discussed in
previous lessons.
The following examples emphasize the functionality of the regular expressions. We will need data
within the file, therefore the next set of commands just appends different strings to the text.txt file.
$ echo "aaabbb1" > text.txt
$ echo "abab2" >> text.txt
$ echo "noone2" >> text.txt
$ echo "class1" >> text.txt
$ echo "alien2" >> text.txt
$ cat text.txt
aaabbb1
abab2
noone2
class1
alien2
Linux Essentials (Version 1.6) | 3.2 Searching and Extracting Data from Files
Version: 2022-04-29 | Licensed for Cyber School. | learning.lpi.org | 201
The first example is a combination of searching through the file without and with regular
expressions. In order to fully understand regular expressions, it is very important to show the
difference. The first command searches for the exact string, anywhere in the line, whereas the
second command searches for sets of characters that contain any of the characters between the
brackets. Therefore, the results of the commands are different.
$ grep "ab" text.txt
aaabbb1
abab2
$ grep "[ab]" text.txt
aaabbb1
abab2
class1
alien2
The second set of examples shows the application of the beginning and the end of the line metacharacter. It is very important to specify the need to put the 2 characters at the right place in the
expression. When specifying the beginning of the line, the meta-character needs to be before the
expression, whereas, when specifying the end of the line, the meta-character needs to be after the
expression.

$ grep "^a" text.txt
aaabbb1
abab2
alien2
$ grep "2$" text.txt
abab2
noone2
alien2
On top of the previous explained meta-characters, regular expressions also have meta-characters
that enable multiplication of the previously specified pattern:
*
Zero or more of the preceding pattern
+
One or more of the preceding pattern
Linux Essentials (Version 1.6) | Topic 3: The Power of the Command Line
202 | learning.lpi.org | Licensed for Cyber School. | Version: 2022-04-29
?
Zero or one of the preceding pattern
For the multiplier meta-characters, the command below searches for a string that contains ab, a
single character and one or more of the characters previously found. The result shows that grep
found the aaabbb1 string, matching the abbb part as well as abab2. Since the + character is an
extended regular expression character, we need to pass the -E option to the grep command.
$ grep -E "ab.+" text.txt
aaabbb1
abab2
Most of the meta-characters are self-explanatory, but they can become tricky when used for the first
time. The previous examples represent a small part of the regular expressions’ functionality. Try all
meta-characters from the above table to understand more on how they work.
Linux Essentials (Version 1.6) | 3.2 Searching and Extracting Data from Files
Version: 2022-04-29 | Licensed for Cyber School. | learning.lpi.org | 203
Guided Exercises
Using grep and the /usr/share/hunspell/en_US.dic file, find the lines that match the following
criteria:
1. All lines containing the word cat anywhere on the line.
2. All lines that do not contain any of the following characters: sawgtfixk.
3. All lines that start with any 3 letters and the word dig.
4. All lines that end with at least one e.
5. All lines that contain one of the following words: org , kay or tuna.
6. Number of lines that start with one or no c followed by the string ati.
Linux Essentials (Version 1.6) | Topic 3: The Power of the Command Line
204 | learning.lpi.org | Licensed for Cyber School. | Version: 2022-04-29
Explorational Exercises
1. Find the regular expression that matches the words in the “Include” line and doesn’t match the
ones in the “Exclude” line:
◦ Include: pot, spot, apot
Exclude: potic, spots, potatoe
◦ Include: arp99, apple, zipper
Exclude: zoo, arive, attack
◦ Include: arcane, capper, zoology
Exclude: air, coper, zoloc
◦ Include: 0th/pt, 3th/tc, 9th/pt
Exclude: 0/nm, 3/nm, 9/nm
◦ Include: Hawaii, Dario, Ramiro
Exclude: hawaii, Ian, Alice
2. What other useful command is commonly used to search within the files? What additional
functionalities does it have?
3. Thinking back at the previous lesson, use one of the examples and try to look for a specific
pattern within the output of the command, with the help of grep.
Linux Essentials (Version 1.6) | 3.2 Searching and Extracting Data from Files
Version: 2022-04-29 | Licensed for Cyber School. | learning.lpi.org | 205
Summary
In this lab you learned:
• Regular expressions meta-characters
• How to create patterns with regular expressions
• How to search within the files
Commands used in the exercises:
grep
Searches for characters or strings within a file
Linux Essentials (Version 1.6) | Topic 3: The Power of the Command Line
206 | learning.lpi.org | Licensed for Cyber School. | Version: 2022-04-29
Answers to Guided Exercises
Using grep and the /usr/share/hunspell/en_US.dic file, find the lines that match the following
criteria:
1. All lines containing the word cat anywhere on the line.
$ grep "cat" /usr/share/hunspell/en_US.dic
Alcatraz/M
Decatur/M
Hecate/M
...
2. All lines that do not contain any of the following characters: sawgtfixk.
$ grep -v "[sawgtfixk]" /usr/share/hunspell/en_US.dic
49269
0/nm
1/n1
2/nm
2nd/p
3/nm
3rd/p
4/nm
5/nm
6/nm
7/nm
8/nm
...
3. All lines that start with any 3 letters and the word dig.
$ grep "^...dig" /usr/share/hunspell/en_US.dic
cardigan/SM
condign
predigest/GDS
...
4. All lines that end with at least one e.
Linux Essentials (Version 1.6) | 3.2 Searching and Extracting Data from Files
Version: 2022-04-29 | Licensed for Cyber School. | learning.lpi.org | 207
$ grep -E "e+$" /usr/share/hunspell/en_US.dic
Anglicize
Anglophobe
Anthropocene
...
5. All lines that contain one of the following words: org , kay or tuna.
$ grep -E "org|kay|tuna" /usr/share/hunspell/en_US.dic
Borg/SM
George/MS
Tokay/M
fortunate/UY
...
6. Number of lines that start with one or no c followed by the string ati.
$ grep -cE "^c?ati" /usr/share/hunspell/en_US.dic
3
Linux Essentials (Version 1.6) | Topic 3: The Power of the Command Line
208 | learning.lpi.org | Licensed for Cyber School. | Version: 2022-04-29
Answers to Explorational Exercises
1. Find the regular expression that matches the words in the “Include” line and doesn’t match the
ones in the “Exclude” line:
◦ Include: pot, spot, apot
Exclude: potic, spots, potatoe
Answer: pot$
◦ Include: arp99, apple, zipper
Exclude: zoo, arive, attack
Answer: p+
◦ Include: arcane, capper, zoology
Exclude: air, coper, zoloc
Answer: arc|cap|zoo
◦ Include: 0th/pt, 3th/tc, 9th/pt
Exclude: 0/nm, 3/nm, 9/nm
Answer: [0-9]th.+
◦ Include: Hawaii, Dario, Ramiro
Exclude: hawaii, Ian, Alice
Answer: ^[A-Z]a.*i+
2. What other useful command is commonly used to search within the files? What additional
functionalities does it have?
The sed command. The command can find and replace characters or sets of characters within a
file.
3. Thinking back at the previous lesson, use one of the examples and try to look for a specific
pattern within the output of the command, with the help of grep.
Linux Essentials (Version 1.6) | 3.2 Searching and Extracting Data from Files
Version: 2022-04-29 | Licensed for Cyber School. | learning.lpi.org | 209
I took one of the answers from the Explorational Exercises and looked for the line that has read,
write and execute as the group permissions. Your answer might be different, depending on the
command that you chose and the pattern that you created.
$ cat contents.txt | tr -s " " | grep "^....rwx"
This exercise is to show you that grep can also receive input from different commands and it can
help in filtering generated information.

Linux Essentials

אודות

משתתפים 3630

חיפוש וחילוץ של נתונים מקבצים

אלכס טימוחוב

הפניית קלט/פלט (I/O Redirection)

הפניית פלט סטנדרטי

הפניית פלט שגיאות סטנדרטי

הפניית פלט שגיאות סטנדרטי

הפניית קלט סטנדרטי

מסמכי Here

שילובים

דוגמה לפקודת cut

צינורות בשורת הפקודה

פקודת wc

חיפוש בקבצים באמצעות הפקודה `grep`

ביטויים רגולריים

דוגמאות

מטא-תווים להרחבת תבניות חיפוש עם `grep`

תרגול מודרך

תרגול נוסף

סיכום

פקודות שנלמדו בשיעור:

מטא-תווים להרחבת תבניות חיפוש עם `grep`

תרגול מודרך

תרגול נוסף

סיכום

פקודות שנלמדו בשיעור:

תשובות לתרגול מודרך

תשובות לתרגול נוסף

דברו איתנו

למידע על קורסים והסמכות

להרשמה

טופס השארת פרטים

Linux Essentials

אודות

משתתפים 3630

חיפוש וחילוץ של נתונים מקבצים

אלכס טימוחוב

הפניית קלט/פלט (I/O Redirection)

הפניית פלט סטנדרטי

הפניית פלט שגיאות סטנדרטי

הפניית פלט שגיאות סטנדרטי

הפניית קלט סטנדרטי

מסמכי Here

שילובים

דוגמה לפקודת cut

צינורות בשורת הפקודה

פקודת wc

חיפוש בקבצים באמצעות הפקודה grep

ביטויים רגולריים

דוגמאות

מטא-תווים להרחבת תבניות חיפוש עם grep

תרגול מודרך

תרגול נוסף

סיכום

פקודות שנלמדו בשיעור:

מטא-תווים להרחבת תבניות חיפוש עם grep

תרגול מודרך

תרגול נוסף

סיכום

פקודות שנלמדו בשיעור:

תשובות לתרגול מודרך

תשובות לתרגול נוסף

חיפוש בקבצים באמצעות הפקודה `grep`

מטא-תווים להרחבת תבניות חיפוש עם `grep`

מטא-תווים להרחבת תבניות חיפוש עם `grep`