< Back to Projects

Minishell

C Shell Parsing Linux
View Source Code on GitHub
Screenshot of the custom Minishell running commands.

Introduction

The Minishell project is a comprehensive reimplementation of a core component of the Unix environment: the command-line interpreter (shell). Replicating many of the fundamental features of Bash shell, this project serves as a deep dive into system programming, demanding meticulous attention to process management, effective I/O redirection, and complex command line parsing.

It was completed during the 42 common core, in collaboration with jrandet.

At its core, a shell operates in a continuous execution loop, defined by three main phases:

1. Waiting: The shell displays a prompt and awaits user command input.
2. Receiving: The command-line input is captured as a raw string.
3. Processing: The input is then analyzed through a precise, multi-stage pipeline—from parsing to final execution—demonstrating a practical understanding of how operating systems manage user commands and processes.

Key Challenges and Features

During this project, we were asked to:

Command Line Parsing

The initial and most critical phase of the shell is transforming the raw input string into a structured, executable format.

Initial Tokenization and Testing Blueprint

Before implementing the parser, I needed a complete understanding of how Bash tokenizes complex command lines. I created the visualization below as a rigorous testing blueprint in Bash. This systematic analysis of distinct command scenarios served as the crucial pre-development stage.

Tokenization and Structural Mapping

Diagram showing various commands broken down into structured components: bin, arg, input redir, file descriptors, and pipe.
Visual representation of different command-line inputs are decomposition, done with Obsidian's canva plug-in.

Data Structure Choice

Instead of a complex Abstract Syntax Tree (AST), we chose an array of command structures. This array-based approach significantly simplifies pipeline execution, as each element directly maps to a stage in the command chain (e.g., cmd1 | cmd2), making process forking and piping straightforward.

The Central Command Structure (`t_cmd`)

The entire pipeline is built around the primary command structure, which encapsulates all necessary information—binaries to execute a single command.

		typedef struct s_cmd
		{
			char		*command;	   // Binary name (e.g., "ls" or "echo")
			char		*path;		  // Full binary path (e.g., "/bin/ls")
			int		 type;		   // Command type Enum (USER, BUILTIN, or INVALID)
			char		**args;		 // Argument list (argv equivalent)
			int		 arg_amount;	 // Number of arguments in the command line
			t_redir	 *redir;		 // I/O redirection (path and type)
			int		 redir_amount;   // Total number of redirections
			int		 heredoc_amount; // Number of heredoc redirections
			pid_t	   pid;			// Process ID, used during execution
			t_pipefd	*pipe_in;	   // int fdes[2] for input pipe redirection
			t_pipefd	*pipe_out;	  // int fdes[2] for output pipe redirection
			bool		is_directory;   // Check if command is a directory (error handling)
			int		 error_access;   // Error number if any access issue occurs
		} t_cmd;
		

Parsing Examples in Action

The array-of-structures model proves its efficiency when handling complex command lines that involve multiple pipes (|) and redirections (<, >).

Example 1: Long Pipeline

ls -l | grep "config" | tee configs.txt | wc -l

Command 1: ls
Path: /usr/bin/ls
Args: ["ls", "-l"]
Total Args: 2 | Total Redirections: 0
Command 2: grep
Path: /usr/bin/grep
Args: ["grep", "config"]
Total Args: 2 | Total Redirections: 0
Command 3: tee
Path: /usr/bin/tee
Args: ["tee", "configs.txt"]
Total Args: 2 | Total Redirections: 0
Command 4: wc
Path: /usr/bin/wc
Args: ["wc", "-l"]
Total Args: 2 | Total Redirections: 0

Each command is an individual t_cmd structure. The shell's execution automatically handles the necessary pipes to connect their input/output streams.

Example 2: Redirection & Pipes

< infile.txt grep a1 | wc -w > outfile.txt

Command 1: grep
Path: /usr/bin/grep
Args: ["grep", "a1"]
Total Args: 2
Redir[0]: infile.txt (IN REDIR)
Command 2: wc
Path: /usr/bin/wc
Args: ["wc", "-w"]
Total Args: 2
Redir[0]: outfile.txt (OUT REDIR)

The initial command reads from a file, the output pipes to the second command, and the second command's final output is redirected to a new file.

Quote and Variable Expansion

One of the subtle complexities of shell programming is mastering the rules for single quotes ('), double quotes ("), and variable expansion ($).

Our core challenge was ensuring the shell behaves exactly like Bash:

Minishell vs. Bash Comparison

Comparison of Minishell and Bash output for a command containing single and double quotes.
A side-by-side comparison of a complex echo command executed in Minishell Versus Bash.

The Execution Phase: Fork, Pipe, and Execve

This phase is responsible for correctly running user-defined binaries and built-in commands, managing I/O redirection, and handling pipelines.

Process Management

To run any external command (a non-built-in like /bin/ls), the shell uses a two-step approach:

  1. fork(): The parent shell process calls fork() to create an exact copy of itself (the child process) in order to create a new process for the command.
  2. execve(): Inside the child process, execve() is called. This function replaces the running program, hence the necessity of fork().
// char *cmd.path = "/bin/ls"
// char **cmd.args = {"ls", "-l", NULL}
// char **env = {"PATH=...", "USER=...", NULL}

execve(cmd.path, cmd.args, env);

The parent shell then waits for the child process to terminate, retrieves its exit status that will be expandable using "$?".

Handling Pipelines with pipe()

Pipelines (cmd1 | cmd2) introduce a layer of complexity known as Inter-Process Communication (IPC). The shell uses the pipe() system call to create a connection between the standard output (stdout) of the first command and the standard input (stdin) of the second.

For a sequence of commands, the process involves:

This meticulous orchestration of fork(), pipe(), and file descriptor manipulation is crucial for the shell's ability to run complex chained commands efficiently.