Git Tutorial

A Tour of the Basics of Git version control system with a hands-on Lab

In this tutorial, I am going to cover the basics of Git with a hands-on approach. Hopefully, at the end you will have a good understanding of Git's fundamentals and you will be able to run Git to version your own projects!

There are several excellents tutorials and books about Git, some of which are cited in the references at the end of the post. Anyway, none of them provides an 'instant lab' with which immediately get your hands dirty.

That's the reason why this tutorial was born: let you learn and experiment at the same time.

Once you have made practice with this tutorial, I suggest you to switch reading those references, where you can find more in-depth explanations that are not duplicated here.

A bit of theory

According to Wikipedia,

Git is a distributed version control system for tracking changes in source code during software development
Let's dissect this statement:

Version control system: for a computer scientist, a version control system (VCS) is a tool that is able to track the changes made to files and computer programs so that it is possible to navigate forwards and backwards among different revisions. Each revision is basically a set of files to which is attached a unique identifier, an informative message, a timestamp and the name of the user that created it. VCSs allow to see the differences between distinct revisions and to easily revert to a previous revision in case, for instance, some error was introduced in the last one.

For tracking changes in source code during software development: as we were saying, a VCS allows us to move backwards and forwards between distinct revisions, or versions, and to see the changes between them. Let us suppose that I have a working version of my project called V01. Then I add some features and create a new version called V02. Unfortunately, it turns out that I broke something and the project at version V02 is no more working. Without Git, I should discover by my own, through logs and debugging, which are the new additions or changes that actually broke the system. Instead, if a use Git, I can simply ask it to tell me the differences between the two versions. By looking at those differences, I can fastly discover the wrong parts!
But the benefits do not end here. After I have discovered where is the problem, I can revert to version V01, so that the project still works in production, and in the mean time work to fix the problems arised in version V02. When those bugs have been fixed, I am able to deploy V02 in production with virtually no downtime! :)

Backups Git (and VCSs in general) provide a much more convenient way of dealing with backups during the process of software development. Indeed, the alternative to using a VCS would be making a distinct backup for each version of the software. This means that you would probably end up with a 'OLD' folder that contains all the previous, timestamped versions. This approach consumes a lot of space on disk, makes difficult the navigation between distinct versions and is terribly error prone. By the way, I worked in a multinational that used this approach to manage backups: make timestamped archives and then copying them to a centralized server by hand. Can you guess how much time is wasted to look for the latest backup, when someone forgets to copy it to the server?!


Distributed: the distributed nature of Git is one of the reason of its success compared to other VCS. Indeed, the first VCSs were locals: developers could track changes between distinct revisions, but there was no easy way to share code.
The second generation of VCSs were centralized: there was a single, version-controlled repository that was accessed remotely by the developers. Each developer could easily share code with the others, but the problems here deal with concurrent accesses (and thus resulting conflicts) and the single point of failure of the central repository.
So we get to Distributed VCSs, like Git. Here each client maintains a full repository, so that no single point of failure exist, and everyone can get a full copy from any other client. Furthermore, several features are present to share code with other clients and manage the conflicts that may arise.

Lab To start and stop the the lab use the buttons below:

Let's start

Click the 'Start Lab' button to start the Linux emulator. This emulator was developed by Fabrice Bellard: you can find more in the references.
We will use Linux to make practice. If you are unfamiliar, don't worry: I will briefly describe all the employed commands.
Let's create a test folder to host our git repository:

mkdir test

Then enter inside the created folder:

cd test

And now let's initialize a new git repository. This is as simple as typing

git init .

where the dot tells git to initialize the repository in the current folder.
Now the repository is initialized and if you list files with

ls -a

you should see a directory called .git. This directory holds git internal representation of the repository. As an example, the database where git saves the committed files (we'll be back to the definition of commit in a moment) is inside this folder.

Let's start writing some code. We will use Python, to implement a very basic system of encryption and decryption with few lines of codes. The encryption system that we are going to develop to test Git functionalities is 2000 years old and takes its name from the roman military general Julius Caesar, who used it to obfuscate its letters.

The Caesar Cipher works by substituting each letter in a message with another letter which is at a fixed distance from it. So, as an example, assuming a fixed offset of 4, the letter A of the alphabet becomes E, the letter B becomes F and so on. Letters at the end of the alphabet wrap around, so that letter Z becomes D and so on.

We will write a very basic Python code, so that a very minimal knowledge of imperative programming should be sufficient to follow up the tutorial.

Let's start with the encryption. All we want to do is to encrypt a plaintext message using the Caesar Cypher.

This is what is done in the 31 lines of encrypt.py. Without digging too much into details, the meaning of code is the following:

First, the user is prompted to insert the plaintext (line 23). Then, it is asked to insert the offset to shift letters. This is done in a while loop, so the user is prompted until it inserts a valid integer distance (lines 25-27). At this point, the letters in the plain text are shifted by the specified offset one by one (lines 29-30). Finally, the encrypted text is displayed (line 31).

encrypt.py

def encrypt(char, shift):
    if not is_valid(char):
        raise Exception('Invalid character ' + char)
    if is_space(char):
        return char
    return shift_char(char, shift)

def is_valid(char):
    char_ord = ord(char)
    return True if (char_ord == 32 or 97 <= char_ord <= 122) else False

def is_space(char):
    return ord(char) == 32

def shift_char(char, shift):
    normalized_ord = ord(char) - 97 # From 0 to 25
    shifted_ord = (normalized_ord + shift) % 26
    new_ord = shifted_ord + 97
    return chr(new_ord)

def convert_to_int(code):
    try:
        int_code = int(code)
        return int_code
    except:
        return code

text = raw_input('\nType the text to be encripted\n')
code_shift = ''
while  type(code_shift) != int or code_shift < 0: 
    code_shift = raw_input('\nType the code shift. It must be an integer number\n')
    code_shift = convert_to_int(code_shift)
encrypted_text = ''
for char in text:
    encrypted_text += encrypt(char, code_shift)
print('\nThe encrypted text is:\n' + encrypted_text)


Decryption is done in the 31 lines of decrypt.py script, and it is specular to encryption.

First, the user is prompted to enter the encrypted text (line 23). Then, it is asked to insert the offset in order to deccrypt the encrypted text. This offset should be the same used during encrption (lines 25-27). At this point each letter of the encrypted text is scanned and turned into the original, plain letter (lines 29-30). Finally, the decrypted text is shown (line 31).

decrypt.py

def decrypt(char, shift):
    if not is_valid(char):
        raise Exception('Invalid character ' + char)
    if is_space(char):
        return char
    return shift_char(char, shift)

def is_valid(char):
    char_ord = ord(char)
    return True if (char_ord == 32 or 97 <= char_ord <= 122) else False

def is_space(char):
    return ord(char) == 32

def shift_char(char, shift):
    normalized_ord = ord(char) - 97 # From 0 to 25
    shifted_ord = (normalized_ord - shift) % 26
    new_ord = shifted_ord + 97
    return chr(new_ord)
    
def convert_to_int(code):
    try:
        int_code = int(code)
        return int_code
    except:
        return code

text = raw_input('\nType the text to be decripted\n')
code_shift = ''
while  type(code_shift) != int or code_shift < 0:
    code_shift = raw_input('\nType the code shift. It must be an integer number\n')
    code_shift = convert_to_int(code_shift)
decrypted_text = ''
for char in text:
    decrypted_text += decrypt(char, code_shift)
print('\nThe decrypted text is:\n' + decrypted_text)

			  

Now let's start playing with our lab!

Assuming that you are inside the test folder, let's create the two encryption/decryption scripts.

First, let's create the two empty files. Type the following commands in the lab windows:

touch encrypt.py

touch decrypt.py

Now let's copy the previous code into the newly created files.

Let's copy the text for encrypt.py by clicking on the button below.



Then, type

nano encrypt.py

in the lab window to open the file encrypt.py with the nano text editor.
Scroll down the outer bar of the lab windows and paste the previously copied text inside the Paste Here box, paying attention that also the last line gets copied (otherwise, write the last line by hand).
Finally, save the changes and close by hitting

Ctrl+X

and then type

Y

to save buffer and finally hit Enter to close the editor.

Now, let's do the same with decrypt.py Let's copy the text for decrypt.py by clicking on the button below.



Then, type

nano decrypt.py

in the lab window to open the file decrypt.py with the nano text editor.
Scroll down the outer bar of the lab windows and paste the previously copied text inside the Paste Here box, paying attention that also the last line gets copied (otherwise, write the last line by hand).
Finally, save the changes and close by hitting

Ctrl+X

and then type

Y

to save buffer and finally hit Enter to close the editor.

At this point we can test our encryption/decryption system.
Let's run the encryption first. To start the encryption type

python encrypt.py

and type the text to be encrypted, such as super secret text. Please note that only lowercase characters and whitespace are supported. Than insert 5 as code shift. You'll get as a result xzujw xjhwjy yjcy.
That's your encrypted text! :)

Now let's demonstrate that decryption works as expected.
To start decryption type

python decrypt.py

and type the text to be decrypted, that is xzujw xjhwjy yjcy. Than insert 5 as code shift. You'll get as a result your old plain text super secret text .
That's your decrypted text! :)

Ok, now that we have our very basic encryption/decryption system in place, let's save our changes with Git, to build a snapshot of this working version of the system.

To see the state of our git repository type

git status

Output:
On branch master

No commits yet

Untracked files:
(use "git add ..." to include in what will be committed)

decrypt.py
encrypt.py             
The output tells that we are on the master branch, that was created by Git for us. A branch is basically a pointer to a snapshot. Branches may be used to isolate the development of new features from the main, working branch.
The second line tells us that no commits have been done yet. A commit in Git is a snapshot, a revision of the files to which you can come back in a later moment upon need.
The last part tells us that there are two untracked files in the directory: encrypt.py and decrypt.py. Untracked files are files which are not under git version control. If you delete untracked files you won't be able to recover them with Git. Furthermore, they will not appear in anyone of your commits!

Before issuing other commands it is a good idea to tell Git who we are. This way every software version has associated the developer that created it. We will use the --global flag, to tell git that the configuration settings that we are specifying apply to an operative system user. In other words, we will need to run this configuration commands only once: git config --global user.name "balzu"

git config --global user.email "thebytemachine@gmail.com"

Feel free to customize the username and the email with your favorite ones.

So what we want to do now is to make Git track these files. This is also a good point to introduce the three git core areas. The first is the Working Tree.

The Working Tree is the directory tree where the files that you are editing are placed. In this case, the test folder is our working directory. As Git told us, files in the working tree are untracked: so the next thing that we're going to do is to make Git track these files. This is done by typing the following command:

git add encrypt.py decrypt.py

We can look at the new state of the Git repository by typing

git status

Output:
On branch master
 
No commits yet
 
Changes to be committed:
  (use "git rm --cached ..." to unstage)
 
        new file:   decrypt.py
	new file:   encrypt.py
The output tells us that we are still on branch master and that no commit has been done so far.
The difference with the previos output is that the two files encrypt.py and decrypt.py have been moved to the area holding the changes to be committed.

This area is the second Git core area and is called Staging Area (or Index). Files added to the staging area are not yet added to the project history (commits). Instead, adding files to the staging area allows the developers to group (logically related) files that will end up in the same commit.

Ok, we are satisfied with our encryption/decryption Cypher: we have indeed implemented it and verified that it correctly works. It is now time to store this first snapshot of the software in a persistent way, so that if during further development something goes wrong we will still be able to revert back to this working version. We can do this with the git commit command:

git commit -m 'First working implementation of the Caesar Cypher. The supported characters are all lowercase alphabetic characters plus the whitespace.'

Every time you commit a snasphot, you must provide a message for that commit. That message should be informative, allowing you to quickly remember what's inside a specific commit without inspecting it deeply. It is possible to specify that message inline with the flag -m.

The files that we committed ended up in the git directory, which is the last of the three core git areas. The git directory is the directory named .git inside your working tree. It is where the project's metadata and database are stored. Every time that you commit a snapshot, you actually add files to the git object database, that is located in the .git/objects folder.

That's it! We wrote our code, told git to track those files, included those files in a snapshot and finally inserted that snapshot into the git database, to be sure that we'll be never going to lose it from our software development history.

Do you trust me? Let's verify it together!

Test

In this simple test, we are going to 'accidentally' lose a file. We will then use git database to recover it and save hours of panic and additional work!

Enter inside the test folder:

cd /root/test

Now, verify that inside the folder you have both files encrypt.py and decrypt.py:

ls

It's now time to accidentally delete a file:

rm decrypt.py

Now let's verify that we actually deleted that file:

ls

In the output from the console, we now see only the encrypt.py script. Oh, no! We lose our decryption script! How we will be able now to decrypt all the files that we previously encrypted? :(

Don't worry, Git will help us to fix this problem!
The command that we need is git log: it lists all the commits in a repository starting from the last one. In our case, only a commit we'll be present:

git log

Output:
commit 3231edb982ccffe57ea8f3ace32586aeae5cc2c4 (HEAD -> master)
Author: balzu 
Date:   Sat Apr 4 11:22:52 2020 +0200
 
First working implementation of the Caesar Cypher. The supported characters are all lowercase alphabetic characters plus the whitespace.
			  
The long string next to 'commit' is the SHA-1 checksum of the commit. It was created on the commited files when we ran the git commit command and uniquely identifies a commit. HEAD and master are terms referred to git branches: in this case, it means that the current branch HEAD points to the master branch. Then we have the author, the date and the message of the commit.

We want to go back to that commit. Since there is no commit in between, changing all tracked files to match the most recent commit is quite easy: you just have to type git reset --hard

Now we can verify that decrypt.py is come back inside our test folder: ls

Output:
decrypt.py encrypt.py  
That's it! We actually recovered our decryption script. Our encrypted files are not lost!

Conclusion

We went through an interactive Git tutorial to get acquainted with some of the most common Git commands and use cases. Of the several Git tutorial availables no one (to the best of my knowledge) provides an environment to istantly apply the learned notions: both printed and online tutorial indeed require you to have a working environment with Git and possibly other prerequisites installed, before starting. Since this is a beginner tutorial, I tried to the deploy an interactive environment where theory and practice mixin at the same time, thinking that this speeds up the learning process, as it allows to skip setups and installations.

We learned some of the basic use cases of Git local workflow, that means using git on a local machine. This is only scratching the surface of the surface of the surface of Git: for some excellent git tutorials you can look in the References.

Please let me know if you liked this tutorial by posting a feedback on the Youtube video page or by email at thebytemachine@gmail.com

References