Data professionals (Engineers, Scientists, and Analysts) always receive data from various sources and they have to decide if the data is worth working on quickly. This is akin to doctors who see patients with multiple symptoms. What almost all doctors do universally is to use their stethoscope first. But data professionals jump to write code in Python or R or some programing language or even worse try to analyze with spreadsheets. This is when simple command line tools are present at their disposal.
This course will open the data professionals to the world of exploring data on the command-line interface.
There are plenty of courses that teach command-line tools and a few courses that deal with data on the command line. But almost all courses do this:
Introduce a lot of commands; however, without proper data to work on one will now know what to use and when. Take a stance that everything can be done on the command line; that again is like using the golden hammer. Command line is great for initial analysis but beyond that, you will need to use programming languages.
So what this course does is take the publicly available dataset from the movie/tv-show/OTT database IMDB and analyze the data using the command line. The same dataset is used throughout the tutorial because that is what you would do practically when you get a dataset.
This is a practical tutorial that works on a Unix-based terminal; first set up one before you begin - if you are working on Mac or Ubuntu, you are already set. If you are on Windows, you can install Power Shell. You do not have to be conversant with the Unix command line but should have the patience to stick to it as the command line is an acquired taste There will be occasions where the tools used in this tutorial is not available pre-installed on your machine; in such cases figure out a way to install the tools. All tools used are popularly available and hence you will not face road blockers.