New Courses: Learn Command Line Fundamentals for Data Science
In a field as tightly connected to computer science as data science is, being able to control your computer like a developer can be a very valuable asset. The Unix command-line interface (CLI; you’ll also see it called the terminal or bash, shell, etc.) allows us to do that and much more.
That’s why we’re launching two brand new courses covering the fundamentals of the command line for our Data Analyst in Python and Data Scientist in Python paths:
No prerequisite knowledge is required to take these courses.
What Will You Learn?
In these two command line courses, you’ll learn to use the Unix terminal interface that’s built-in on Mac and Linux machines. Don’t worry, we’ll also provide Windows users with the tools required to take full advantage of the content.
In the first course, you’ll learn what the command line interface is, why it is important in the data science workflow, and how you can navigate and manage your computer by giving it instructions called commands. You’ll also learn about wildcards and how to use them together with commands like ls, mv, cp, mkdir and many more for faster searches and workflows.
The second course is focused on basic text processing in the shell, using commands like head, cat, cut and grep. It covers how you can combine these commands to create powerful chains of commands from simpler building blocks. You’ll also learn about multi-user systems and the power of output redirection.
And as is the case for all Dataquest courses, these new command line courses use an interactive command line environment and answer-checking to allow you to apply and check everything you’re learning from directly within your browser.
12 Reasons to Learn the Command Line
Making the switch from graphical user interfaces (GUIs) to a CLI can feel overwhelming, but we’re here to help you! To give you a jump-start, here are a few reasons why you should be learning the command line.
1. Command Line Skills Are Popular and Pay Handsomely
According to 2018’s Stack Overflow’s Developer Survey, bash/shell (i.e. the family of Linux command language interpreters) is the sixth most used language overall, ranking ahead of Python and R. It was also associated with higher salaries than either Python or R, according to the survey.
It also made the list of the most wanted and most loved technologies, while staying clear of the most dreaded technologies list.
And while StackOverflow’s survey covers software developers and engineers of all sorts, the command line is of particular relevance for data scientists because Bash/Shell correlates heavily with Data Science technologies like Python, IPython/Jupyter, TensorFlow and PyTorch. This is also supported by the most recent Python Developers Surveyconducted by Python Software Foundation.
2. Command Line Skills Help With Building Repeatable Data Processes
Part of a data scientist’s role is to make sure certain information is available regularly, often daily. Most of the time this data is acquired, processed and displayed in the same way.
The command line is well suited for this purpose because commands are easily automated and replicated.
Consider the following situation. Your employer decided to invest in data analytics. Several data professionals will be joining the team. You are tasked with making sure that their machines have everything they need to get started. If you can work with a CLI (command language interpreter) you can write a few scripts that will install, configure and test everything automatically. If you don’t, you’ll have to resort to a GUI and make the same mouse and click movements, repeatedly, across several machines.
That’s just one example of how terminal skills can help make data science processes more scalable and repeatable.
3. Command Line Skills Make You More Flexible
In a data science role, you’ll often find you have more flexibility if you can use the terminal rather than having to rely on clicking through GUIs. Since the command line is a program that runs other programs (hence the name “shell”), the interaction between programs is often easier to adjust in the command line. Once you’ve mastered command line commands, it’s relatively easy to write scripts, and shell scripts make building all sorts of data pipelines and workflows much simpler.
More broadly, knowing how to use the shell gives you a second option for interacting with your computer. You can always use the GUI when you prefer, but the command line can provide you with more direct power and control for those times when you need it.
4. Working With Text Files is Easier
Text files are among the most common ways methods to store and handle data, and almost any data science project is going to involve some work with text files. Being able to handle text files quickly and efficiently is thus a very useful skill for a data scientist.
The shell has very powerful text processing tools like AWKand sed, which help with getting acquainted with files and facilitate data cleaning.
For example, the code below uses AWK to print the first and third columns of a file named a_csv_file, where the second field’s value is Dataquest, using a comma as a field separator.
awk ‘BEGIN {FS=”,”} {if ($2==”Dataquest”) {print $1 $3} } a_csv_file’
All it takes is one line of code!
5. It’s Less Resource-Intensive
When you’re working with limited computing resources or simply want to maximize your speed, the using the command line is virtually always going to be better than using a GUI because using a GUI means resources must be dedicated to rendering the graphical output.
This is true both for working locally and remotely. When connecting remotely, GUIs consume much more bandwidth than terminals, wasting resources. Moreover, latency, i.e. the “time interval between the stimulation and response”, will be higher when using a GUI, which can be particularly frustrating if you’re trying to control a mouse that’s a second or two behind your actual movements. If you’re just typing in the command line, the latency is likely to be lower and it will also be easier to handle since you know precisely where your cursor is at any given time.
6. You Need Command Line Skills for the Cloud
Cloud services often are connected to and operated through a command line interface. This is particularly important for more advanced data science work like deep learning, where your local computing resources are likely to be insufficient for the tasks you’d like to perform. To quote from this 2018 article by Nucleus Research:
In last year’s research, fewer than 10 percent of [deep learning] projects were being run on premise. That trend has accelerated, with only 4 percent of projects running on-premise in 2018.
According to the same article, “96 percent of deep learning today is running in the cloud.” If you’re interested in learning advanced techniques like deep learning, command line skills will be necessary for moving your data to and from the cloud efficiently.
7. Unix Shell Skills Transfer Well to Other Shells
There are just a few popular shells (bash, zsh, fish, ksh, tcsh, cmd, Windows PowerShell, etc.) and they are more alike than they are different, making it easy to switch between them. This is particularly useful when you’re using online services that require some kind of CLI. On the other hand, GUIs are endless, and learning one won’t necessarily help you learn any others.
8. You Can Probably Type Faster Than You Click
Research shows that mouse use plateaus rather quickly, while keyboard use, despite its steep learning curve, can be more efficient.
251 experienced users of Microsoft Word were given a questionnaire assessing their choice of methods for the most frequently occurring commands. Contrary to our expectations, most experienced users rarely used the efficient keyboard shortcuts, favoring the use of icon toolbars instead. A second study was done to verify that keyboard shortcuts are, indeed, the most efficient method. Six participants performed common commands using menu selection, icon toolbars, and keyboard shortcuts. The keyboard shortcuts were, as expected, the most efficient.
In other words: even if you feel like you’re working quickly through a GUI, there’s a good chance that at least for some tasks, you’ll be more efficient in the command line.
9. Auditing and Debugging is Easier
Because it is so easy to track all of your activity on the command line, auditing and debugging is much easier. You can easily look through the log to track every single action you took in the shell, whereas if a misclick leads to a mistake when you’re working with a GUI, there’s likely to be no record of it.
10. The Unix Shell is Available Everywhere
Although it’s only built-in on Mac and Linux machines, Windows users can still join in on the fun with tools like WSL, Cygwin and MinGW. That means that the command line skills you learn in these courses will be usable on virtually every computer you encounter (including your personal machine, no matter what operating system you use).
11. You Can Harness the Power of Language
When we interact with the computer through a terminal window, we use commands. These commands are part of a language and languages have tremendous expressive power. In an article for the Association for Computing Machinerycalled “The Anti-Mac Interface“, Jakob Nielsen makes an insightful analogy, which helps us compare CLIs to GUIs:
The see-and-point principle states that users interact with the computer by pointing at the objects they can see on the screen. It’s as if we have thrown away a million years of evolution, lost our facility with expressive language, and been reduced to pointing at objects in the immediate environment. Mouse buttons and modifier keys give us a vocabulary equivalent to a few different grunts. We have lost all the power of language, and can no longer talk about objects that are not immediately visible (all files more than one week old), objects that don’t exist yet (future messages from my boss), or unknown objects (any guides to restaurants in Boston).
He continues.
If we want to order food in a country where we don’t know the language at all, we’re forced to go into the kitchen and use a see-and-point interface. With a little understanding of the language, we can point at menus to select our dinner from the dining room. But language allows us to discuss exactly what we would like to eat with the waiter or chef. Similarly, computer interfaces must evolve to let us utilize more of the power of language. Adding language to the interface allows us to use a rich vocabulary and gives us basic linguistic structures such as conditionals.
The command line allows us to use the power of language to interact with the computer in a more refined, elegant and efficient manner.
The author concludes the analogy by saying that “real expressive power comes from the combination of language, examples, and pointing.” In other words, you will be more efficient if you can use a healthy combination of a CLI and GUIs. The point of these command line courses isn’t to get you to abandon GUIs entirely, it’s to give you another tool in your arsenal, to make you more efficient with particular tasks and workflows.
12. The Command Line is Simpler Than You Think
There is a misconception that using the command line requires you to know several hundred commands. In fact, although there are hundreds of commands available for use, you’re likely to need just a tiny percentage of these commands to do most of common data science tasks.
Still not convinced? We’ll leave you with a quote from the great (and free!) book The Linux Command Line:
When I am asked to explain the difference between Windows and Linux, I often use a toy analogy.
Windows is like a Game Boy. You go to the store and buy one all shiny new in the box. You take it home, turn it on, and play with it. Pretty graphics, cute sounds. After a while, though, you get tired of the game that came with it, so you go back to the store and buy another one. This cycle repeats over and over. Finally, you go back to the store and say to the person behind the counter, “I want a game that does this!” only to be told that no such game exists because there is no “market demand” for it. Then you say, “But I only need to change this one thing!” The person behind the counter says you can’t change it. The games are all sealed up in their cartridges. You discover that your toy is limited to the games others have decided that you need.
Linux, on the other hand, is like the world’s largest Erector Set. You open it, and it’s just a huge collection of parts. There’s a lot of steel struts, screws, nuts, gears, pulleys, motors, and a few suggestions on what to build. So, you start to play with it. You build one of the suggestions and then another. After a while you discover that you have your own ideas of what to make. You don’t ever have to go back to the store, as you already have everything you need. The Erector Set takes on the shape of your imagination. It does what you want. Your choice of toys is, of course, a personal thing, so which toy would you find more satisfying?
Now that we’ve convinced you to learn the command line, click here to start learning! These new courses do require at least a Basic subscription, but as of this writing, Dataquest’s Premium subscription is on sale for 50% off — that will give you access to both courses, as well as every other course we offer and some other cool benefits.