PY4E – Python For Everybody
Learning Python with Dr.Chuck free course
First of all I would like to thank D. Chuck for sharing his knowledge by creating this free online course
NOTE: The following are just my study note.
#1 – Why use a Programming language
Programmer write a sequence of instructions or code to solve problems.
Computers takes typing literally so they will do not accept or recognize errors.
~ What the Hardware does:
– CPU ask for instruction
– MEMORY RAM store the instructions (faster temporary storage)
– SECONDARY MEMORY (HDD’s or SSD’s) Store the instruction (slow permanent storage)
– MOTHERBOARD connects all the components
~ What the Language Software does :
– You write a file/program
– It get loaded into RAM
– Python will translate the language so the CPU understand it
– Once the program is terminated the python will translate back to you the answer
~ Python (ATOM text editor to write Python is recommended in the course)
You can run python inside the terminal, but generally you will type the script on a text editor and run it after you save the file with a .py extension.
If you type a mistake python will display a “syntax error” message to let you know where the mistake was made.
~ Elements of python:
– Vocabulary/words : reserved and variables
– Sentence structure : valid syntax patterns
– Story structure : construct a purposeful program
~ Vocabulary Reserved Words (you can’t use the following for other purposes):
False – class – return – is – finally – None – if – for – lambda – continue – True – def – from – while – nonlocal – and – del – global – not – with – as – elif – try – or – yield – assert – else – import – pass – break – except – in – raise.
~ Sentences or Lines:
Example: (# value means is a comment and python skips what’s after)
x = 2 #assignment statement
x = x + 2 #assignment with expression
print(x) #print statement
~ Structure:
The script can be in a sequence (step after step), conditional (If followed variables), repeated or just few lines can be repeated over and over.
A program can combine different lines together.
#2 – Variables, Expression and Statement
~ Constants:
constant are numbers, letters and strings (string use single quote (‘) or double quote (“))
~ Variables:
Basically you ask python to save specific digit for determinate variables.
Examples: x = 2 , but remember that if after you follow up with x = 5 it will overwrite the new value over the old one.
Variables must start with letter or _ and can be written with letter, underscores and numbers (case sensitive)
– Accepted Examples: variables variables21 _variables
– Unaccepted Examples: 21variables @variables varia.bles
– Accepted but preferably to avoid: vArIaBlElS Variables VARIABLES
But is good practice to use Mnemonic Variables Names (mnemonic = memory aid), so they are more understandable to everybody as variable.
Instead, for example, of using “x” we can use “_hours” if we refer to hours.
~ Sentences or Lines
Assignment statement:
we use = to assign variables x = 2 or x = 2 + x * 3 / 4
they can be numeric expression made with: (), % (remainder) ** (power of), * /, + – . In order of importance, and if they are the same level it goes from left to right.
~ Type
Python recognize difference between numbers and string. But it can no compute an operation with string and number (trackback/type error).
By writing type, python will describe which type the constant is: ‘int’=1 ‘float’=1.0 ‘str’=one .
But you can convert for example:
float(1) / 1 = 1.0 / 1 = 1.0
You can convert when possible with int() or float() like the example above.
~ Input
when we type input(), python will stop as is asking the user a question (that will return as a string) and after the user answer it will complete the code.
Example:
_name = input(‘who are you? ’) #here user will digit their name and press enter
print(‘Welcome’, _name) #It will print: Welcome , Username
#3 – Conditional Execution
~ Conditional statement
We use if to create a condition follow by the following simbols: < (less than), <= (less or equal), == (equal), >= (greater or equal), > (greater) and != (not equal)
Example:
~x = 5
~if x < 10:
~ print(‘Smaller’) # The space in front is an indent
~if x > 20:
~ print(‘Bigger’)
~print(‘Finito’)
~ Block and indentations (4 spaces or 1 tab in atom text)
A block is a set of lines maintained with indentations after if or for.
The indentation is composed by 4 spaces but in some software as Atom you can type the Tab, It’s important that the indentation has the same length to be part of the same block.
One indentation:
~x = 5
~if > 1
~ print (‘bigger than one’) #increase indentation
~ print (‘still bigger’)
~print (‘done with 2’) #decrease indentation
Double indentation or nested decision
~x = 5
~if > 1
~ print (‘bigger than one’)
~ print (‘still bigger’)
~ # empty line are not considered by python
~print (‘finito’)
Two-way decision
To use a two-way decision you need to use else , which also close the code : this one or that one
~x = 4
~if x > 2 :
~ print(‘Bigger’)
~else :
~ print(‘Smaller’)
~
~print(‘finito’)
Multi-way decision
They also called puzzle and are made with elif , you can have multiple elif but be sure that one condition don’t match the following condition
~x = 4
~if x < 2 :
~ print(‘small’)
~elif x < 10 :
~ print(‘medium’)
~else :
~ print(‘large’)
~
~print(‘finito’)
Try / Except structure
It surrounds a dangerous section of code (only one line recommended) so if for some reason blows, python will still run the comand till the end.
~astr = ‘hello bob’
~try:
~ istr = int (astr) #you can’t convert letters in numbers so the code blows up
~except:
~ istr = -1 #but it will use -1 instead
~print (‘First’, istr) # so the code will print : First -1
~astr = ‘123’
~try:
~ istr = int (astr) #you can convert letters in numbers so the code will print
~except:
~ istr = -1 #no need to read this line now because the code worked
~print (‘Second’, istr) #and will print : Second 123
#4 – Function
Only write function if they actually required, for 20 lines of code they are probably not necessary
~ def
Def function = define code
It defines a set reusable code to avoid typing it over and over inside the file
~def ciao
~ print(‘hello’)
~ciao() #hello
~print(‘how are you’) #how are you
~ciao() #hello
~def greet(lang) :
~ if lang == ‘es’ #if the user type es it will print hola
~ print(‘hola’)
~ elif lang == ‘fr’ #if the user type fr it will print bonjour
~ print(‘bonjour’)
~ else: #if the user type any other language it will ~ print hello
~print(‘hello’)
~ Return value
It used with the return instead of the print command:
~def greet(lang) :
~ if lang == ‘es’ #if the user type es it will return hola
~ return ‘hola’
~ elif lang == ‘fr’ #if the user type fr it will return bonjour
~ return ‘bonjour’
~ else: #if the user type any other language it will ~ return hello
~return ‘hello’
~ Max function
Python assign a value of the letters and pick up the higher value ones with the max function
~big = max(‘hello world’) # ‘hello world’ is an argument
~print(big) # it will print w
#5 – Loops & Iterations
~ while
While is similar to the if statement but it will run till becomes False,
It’s called also iteration variable and indefinite loop.
Avoid to create infinite iteration as the computer will keep spinning, on the other and,
Zero Trip loop are loops that don’t run.
~n = 5
~while n > 0 : #it will keep running the question till is n=0
~ print(n)
~ n = n – 1
~print(‘finito’)
~ break
break will break the loop once the user put the selected input
~while True :
~ line = (‘> ’)
~ if line == ‘done’ :
~ break #loops breaks when user types done
~ print(line)
~print (‘finito’)
~ continue
continue will skip everything after and go back to the specified line
~while True:
~ line = (‘> ’)
~ if line[0] == ‘#’ :
~ continue #when user types # it will go back to line0
~ if line == ‘done’ :
~ break
~ print(line)
~print (‘finito’)
~ for
for is used to set definite loops, imagine the code being a contract with python where it has to run once the list of items
~for y in [5, 4, 2, 3,1]
~ print(y) #it will print once each 5,4,2,3,1
~print(‘finito’) #once completed will print finito
~x = [‘afternoon’, ‘bye’, ‘morning’]
~for y in x :
~ print(‘good’ , y) #print good afternoon, good bye, good morning
~print(‘finito’) #when finish the loop finito
~ Largest Number
Largest number can be found by assigning one variable > the other
~largest = -1
~print (‘inizio’, largest)
~for xyz in [5, 9, 2, 8, 10, 3, 4] :
~ if xyz > largest :
~ largest = xyz #replace -1 with 5, 9, 10 in order. Skips lower numbers
~ print(largest, xyz)
~print(‘fine’, largest) #fine, 10
~ Counting using loops
~ripetizioni = 0
~somma = 0
~print(‘inizio’, ripetizioni, somma)
~for numeri in [9, 41, 12, 3, 74, 15] :
~ ripetizioni = ripetizioni + 1 #how many loops
~ somma = somma + numeri #total sum
~ print(ripetizioni, somma, numeri)
~print(‘Dopo’, ripetizioni, somma, somma / ripetizioni) #show count, sum and average value
~ Filter using if
~print(‘Prima’)
~for valore in [9, 41, 12, 3, 74, 15] :
~ if valore > 20:
~ print(‘Large number’, valore)
~print(‘Dopo’)
~ Boolean Variable
Variable that return only values of True or False
~cerca = False
~print(‘Prima’, cerca)
~for valore in [4, 756, 2, 78] :
~ if valore == 2 :
~ cerca = True #when value 2 is found, become True
~ print(cerca, valore)
~print(‘Dopo’, cerca) # print True
~ for & is None
We can create a better way to find values (largest or smallest) by introducing is followed by None.
The is operator (or is not) is stronger than == because the value has to be exactly the same (by number and type).
None is a constant and means emptiness.
~piccolo = None
~for valore in [3, 2, 6, 8, 1] :
~ if piccolo is None : #no initial value is consider
~ piccolo = valore #start with the first value in the list
~ elif valore < piccolo :
~ piccolo = valore #replace the value with the smallest
~ print piccolo, valore
~
~print(‘dopo’, piccolo) #dopo, 1
#6 – Strings
A string is a sequence of characters written between ‘ ’ or “ ”.
Even if contains numbers, is still consider a string; although we can convert numbers in a string with int().
We can use the + symbols to concatenate multiple strings.
Because it gives us more control we tend to read data in strings and then convert if necessary.
We can also index the strings by using the [ ] starting with 0,1,2…:
~nazione = ‘united kingdom’
~lettera = nazione[0] #the 0 number means the first letter
~print(letter) #In this case will be u (if we put 4 it will be t)
~x = 3
~y = [x – 1] #we CAN’T index beyond the length of the string
~print(y) #i
~ len
len measure the length of the string
~nazione = ‘united kingdom’
~x = len(nazione)
~print(x) #14
~ Creating loops to display the string
~nazione = ‘united kingdom’
~indice = 0
~while indice < len(nazione) :
~ lettera = nazione[indice]
~ print(indice, lettera) #0u
~ indice = indice + 1 #looping back inside the index: 1n, 2i, 3t…
~nazione = ‘united kingdom’
~for lettera in nazione:
~ print(lettera) #same result as above but in an simpler and elegant way
obviously we can loop and count as in the previous chapter too
~ Extrapolate from the string
To extrapolate part of the string use [x : y] which means from, to (but not including the last digit)
~nazione = ‘united kingdom’
~print(nazione[0:4]) #unit (not include the 4th )
~print(nazione[7:50]) #kingdom (you can do it even if exceed)
~print(nazione[:6]) #united (you can leave blank first and last)
~ Concatenation
~x = good
~y = x + ‘morning’
~print(y) #goodmorning (without space)
~z = x + ‘ ’ + ‘morning’
~print(z) #good morning (with space)
~ in
~x = ‘united kingdom’
~‘nit’ in x #True
~‘b’ in x #False
~ .lower & .upper
~x = Ciao
~y = x.lower()
~print(y) #ciao (all lower case)
~z = x.upper
~print(z) #CIAO
~ .find
~x = Ciao
~y = x.find(‘a’)
~print(y) #2 (index position)
~z = x.find(‘m’)
~print(z) #-1 (no character found)
~ .replace
~x = buon giorno
~y = x.replace(‘giorno’ , ‘pomeriggio’)
~print(y) #buon pomeriggio
~z = x.replace(‘g’ , ‘KK’)
~print(z) #buon pomeriKKio
~ Stripping withspaces
Whitespaces are referred to spaces ‘ ’, tab ‘ ’ and all the blanks.
~x = ‘ ciao ’
~x.lstrip()
~print(x) #‘ciao ’
~x.rstrip()
~print(x) #‘ ciao’
~x.strip()
~print(x) #‘ciao’
~ Prefixes
Basically asks does the string starts with …?
~x = ‘ciao come stai?’
~x.startwith(‘ciao’) #True (is case sensitive)
~ Strings variables
Check the python documentation where it will explain in details at
( https://docs.python.org/3/library/stdtypes.html#string-methods ).
Or to check which one is availible from python:
~x = Ciao
~type(x) #‘str’ (classified that it is a string)
~dir(x) # will shows all the variables
#7 – Files (secondary memory)
~ Text Files
Text file are considered a sequence of lines.
We need to tell Python which file we want to use and what we will be doing with it.
For this we use the open() function, that will return a “file handle” which is a variable use to perform operations on the file.
(same concept as File -> Open in word).
Example: ~fhandle = open(‘filename’, ‘mode’) . Where mode is optional and it will be ‘r’ (read) or ‘w’ (write). Filename on the other hand is considered as a string.
~fhand = open(‘file.txt’, ‘r’) #fhand is a wrapper
~print(fhand) #will show wrapper, name, mode, encoding
If file doesen’t exist : traceback
~ \n
\n means newline (where the line ends) and it’s considere only 1 characther
~x = ‘y\nz’ #will print y (new line) z
~len(x) #3
~ Reading file
~fhand = open(‘file.txt’)
~for riga in fhand:
~ print(riga) #read all the line in the file
~fhand = open(‘file.txt’)
~inp = fhand.read()
~print(len(inp)) #will show the number of character
~print (inp[:20] #will sho the first 20 character
~ Counting line
~fhand = open(‘file.txt’)
~conta = 0
~for riga in fhand:
~ conta = conta + 1
~ print(‘numero righe:’, conta) #numero righe : 100
~ Searching
~fhand = open(‘file.txt’)
~for riga in fhand:
~ line = line.rstrip() #to remove the \n
~ if riga.startwith(‘From:’) :
~ print(riga) #will print all lines that start with from
or by using if not
~fhand = open(‘file.txt’)
~for riga in fhand:
~ line = line.rstrip() #to remove the \n
~ if not riga.startwith(‘From:’) :
~ continue
~ print(riga) #will print all lines that start with from
or to run the same program to different file we can add the following line above using input:
~fname = input(‘file.txt’) #file.txt now is a variable
~ Bad names file or if they don’t exist
If the file file.txt doesn’t exist we will have a TB error. to prevent:
~fname = input(‘file.txt’)
~try:
~ fhand = open(fname) #if file exist will go to the following block
~except:
~ print(‘file cannot be opened:’, fname)
~ quit() #if doesn’t will print file cann… and then quit
~
~conta = 0
~for riga in fhand:
~ conta = conta + 1
~ print(‘numero righe:’, conta)
#8 – List []
List are basically collection of data. Programming can be divided in two, Algorithms (set of rules to solve problems) and Data Structures (organizing data).
Python can run numbers, strings and Int as follows: x = [‘ciao’, 3, 4.5] #it always start from 0 (not 1)
Even list of lists x = [5, [2, 3], 9] and empty list []
Stirngs are Immutable but list are mutable
~ in (works the same as in string)
~ len
~x = [1, 3, 5, ‘bob’]
~print(len(x)) #4
~ range
~x = [1, 3, 5, ‘bob’, 5]
~print (range(len(x))) #[0, 1, 2, 3, 4]
~print(range(4)) #[0, 1, 2, 3]
can be use with for
~x = [1, 4, 5]
~for y in x :
~ print(‘num:’, y) #num: 1 num: 4 num: 5
~
~for z in range(len(x)) :
~ y = x[z]
~ print(‘num:’, y) #num: 1 num: 4 num: 5 (same result)
~ concatenating
~a = [2, 4, 6]
~b = [3, 5,]
~c = a + b
~print(c) #[2, 4, 6, 3, 5]
~ sliced
~a = [2, 4, 6]
~a[:1] #[2, 4]
# list methods
shows all the stuff we can do with the a list
~x = list
~type(x) #<type ‘list’>
~dir(x) #[‘append’, ‘count’, ‘extend’, ‘index’, ‘insert’, ‘pop’, ‘remove’, ‘reverse’, ‘sort’]
~ append
~x = list() #to create an empty list (list is a reserved word)
~x.append(‘1’) #to insert 1 in the list
~ sort
~x = [‘m’, ‘a’, ‘q’]
~x.sort()
~print(x) #[‘a’, ‘m’, ‘q’]
~ built-in function
~x = [1, 3, 5, 9, 2]
~print(len(x)) #5
~print(max(x)) #9
~print(min(x)) #1
~print(sum(x)) #21
~print(sum(x)/len(x)) #4.2
# check 4:06:05 to see some examples
~ Split
String and list are related to each other. the following example is a double split to extract the mail.
~x = ‘da email@provider.com date’
~y = x.split() #in this case it only consider whitespaces (does NOT count ,.:; etc…)
~print(y) #[‘da’, ’email@provider.com’, ‘date’] convert the original in different strings
~mail = y[1] #it refer to the string 1 in this case ’email@provider.com’
~z = x.split(@) #in this case it split with e
~print(z) #[’email’, ‘ provider’] convert the original in different strings
~print(z[0]) #it will print email
~ guardian pattern and how to fix the issues
Original code, it printed some but it TB at some point and we don’t know how:
~abc = open(‘file.txt’)
~
~for riga in abc:
~ riga = riga.rstrip()
~ prl = riga.split()
~ if prl[0] != ‘da’ :
~ continue
~ print(prl[2])
To start finding the problem we need to insert print() till we find where it TB
~abc = open(‘file.txt’)
~
~for riga in abc:
~ riga = riga.rstrip()
~ print(‘riga’, riga) #to check why
~ prl = riga.split()
~ print(‘parola’, prl #to check why
~ if prl[0] != ‘da’ :
~ print(‘ignore’) #to check why
~ continue
~ print(prl[2])
After running the program we discovered that in this case the problem was that some line didn’t have words on them.
~abc = open(‘file.txt’)
~
~for riga in abc:
~ riga = riga.rstrip()
~ print(‘riga’, riga)
~ prl = riga.split()
~ print(‘parola’, prl
~ # Guardian pattern
~ if len(prl) < 1 :
~ continue #if the line has less the one word -> continue
~ if prl[0] != ‘da’ :
~ print(‘ignore’)
~ continue
~ print(prl[2])
or another way like the following
~abc = open(‘file.txt’)
~
~for riga in abc:
~ riga = riga.rstrip()
~ print(‘riga’, riga)
~ if riga == ‘ ‘ :
~ continue #if the line is a blank skip the line
~ prl = riga.split()
~ print(‘parola’, prl
~ if prl[0] != ‘da’ :
~ print(‘ignore’)
~ continue
~ print(prl[2])
coming back to the guardian line we can get ready of the extra, and make the guardian stronger, in this case if we try to print (in the last piece of code) a line with less than 2 words it will TB
~abc = open(‘file.txt’)
~
~for riga in abc:
~ riga = riga.rstrip()
~ prl = riga.split()
~ # Guardian a little stronger
~ if len(prl) < 3 : #by changing to 3 it will skip all the line will less than 3
~ continue
~ if prl[0] != ‘da’ :
~ continue
~ print(prl[2])
or we can compound the guardian like this
~abc = open(‘file.txt’)
~
~for riga in abc:
~ riga = riga.rstrip()
~ prl = riga.split()
~ # Guardian in a compund statement
~ if len(prl) < 3 or prl[0] != ‘da’ : #be careful as it runs in order
~ continue
~ print(prl[2])
#9 – dictionaries or Collections
list: linear collection of value (in order), index are [number]
dictionaries: “database” of value each with it’s own value (no specific order), index {word}
If you look for a key that doesn/t exist will result in TB
~ create dictionaries
~x = dict() #create a dictionaries or you can use: x = { }
~x[‘pr’] = 3 #add 3 and name it pr
~x[‘dpp’] = 6 #add 6 and name it dpp
~print(x) #{‘pr’ : 3, ‘dpp’ : 6}
~print(x[‘dpp’]) #6
~x[‘dpp’] = x[‘dpp’] + 1 #to edit dpp
~print(x) #{‘pr’ : 3, ‘dpp’ : 7}
~ counting from string
~x = dict()
~x[‘primo’] = 1
~x[‘sesto’] = 1 #assign a number 1 to a word to start the count
~print(x) #x = {‘primo’: 1, ‘sesto’: 1}
~x[‘sesto’] = x[‘sesto’] + 1
~print(x) #x = {‘primo’: 1, ‘sesto’: 2}
on an existing one
~x = dict()
~lista = [‘casa’, ‘topo’, ‘cane’, ‘topo’, ‘topo’]
~for nome in lista :
~ if nome not in x:
~ x[nome] = 1
~ else :
~ x[nome] = x[nome] + 1
~print(x) #{‘casa’: 1, ‘topo’: 3, ‘cane’: 1}
~ get
Used to simplify if…else… with one line only
~x = dict()
~lista = [‘casa’, ‘topo’, ‘cane’, ‘topo’, ‘topo’]
~for nome in lista :
~ x[nome] = x.get(nome, 0) +1
~print(x) #{‘casa’: 1, ‘topo’: 3, ‘cane’: 1}
~ a counting pattern program in general
~x = dict()
~print (‘enter line of text:’)
~riga = input(‘ ‘)
~
~lista = riga.split()
~
~print(‘words:’, lista)
~
~print(‘counting…’)
~for nome in lista :
~ x[nome] = x.get(nome, 0) +1
~print(x)
~ retrieving lists of keys
~x = {‘casa’: 1, ‘topo’: 3, ‘cane’: 1}
~print(list(x)) # [‘casa’: 1, ‘topo’: 3, ‘cane’: 1]
~print(x.keys{}) # [‘casa’: 1, ‘topo’: 3, ‘cane’: 1]
~print(x.values{}) # [1, 3, 1]
~print(x.items()) # [(‘casa’: 1), (‘topo’: 3), (‘cane’: 1)]
~ two iteration variables
~x = {‘casa’: 1, ‘topo’: 3, ‘cane’: 1}
~for aa,bb in x.items() :
~ print(aa, bb) #[(aa: bb), (‘aa: bb), (aa: bb)]
#a counting program with what we learned so far
~file = input(‘enter file:’)
~handle = open(file)
~
~conto = dict()
~for riga in handle:
~ nome = riga.split()
~ for parola in nome:
~ conto[parola] = conto.get(parola, 0) +1
~
~grandeconto = None
~grandeparola = None
~for parola,conto in conto.items() :
~ if grandeconto is None or conto > grandeconto :
~ grandeparola = parola
~ grandeconto = conto
~
~print(grandeparola, grandeconto)
#10 – Tuples
Tuples are the fast and efficient counterparts of directories and lists, that is because they are immutable and therefore can’t be modified.
They use () and they can also be more than one variable (on the left) like so (as long as the list on the right have the same number): (x, y) = (4, ‘tom’)
Or in the case of directories they assume the pair (key and value for example).
Tuples can be compared <> True False and it will do that by the first difference python finds (from left to right).
~fhand = open(‘file.txt’)
~cnt = dict()
~for riga in fhand:
~ prle = line.split()
~ for pr in prle:
~ cnt[pr] = cnt.get(pr, 0) +1 #extrapolate each word
~
~lst = list()
~for k, v in cnt.items() :
~ ntup = (v, k)
~ lst.append(ntup) #create a tuple inside a dict word, word count with all the word
~
~lst = sorted(lst, reverse=True) #reverse : word count, word
~
~for v, k in lst[:5] :
~ print(k, v) #print top 5 used word and count
simplified way
~fhand = open(‘file.txt’)
~cnt = dict()
~for riga in fhand:
~ prle = line.split()
~ for pr in prle:
~ cnt[pr] = cnt.get(pr, 0) +1
~
~print( sorted( [ (v,k) for k,v in cnt.items() ] ) ) #reduces the code lines
~
~for v, k in lst[:5] :
~ print(k, v)
#11 – Regular expression
Note: they are not essential for using python but it's worth knowing them I will eventually come back to them but for now:
Regular expression (regex or regexp) provide a concise way of looking for strings text files.
They use their own characters and written in a formal language
They are not part of standard python; to use them type import re
Some example : re.search() which is similar to find() and re.findall() combination between find() and slicing
~Quick Guide for regex
https://www.py4e.com/lectures3/Pythonlearn-11-Regex-Handout.txt
^ #matches the beginning of the line
$ #matches the end of the line
. #matches any character
\s #matches whitespaces
\S #matches any non whitespaces
* #Repeats a character zero or more times [longer string]
*? #Repeats a character zero or more times (non-greedy) [shorter string]
+ #Repeats a character one or more times
+? #Repeats a character one or more times (non-greedy)
[aeiou] #matches a single characther in the listed set
[^XYZ] #matches a single characther not in the listed set
[a-z0-9] #the set of character can include a range
( #indicates where string extraction is to start
) #indicates where string extraction is to end
wild card characters
.* #matches any characters many number of time
#12 – Networked Programs
[6:22:40]
*Dr. Book on the subject: Introduction To Networking
We operate on the Transport layer of the TCP/IP protocol.
TCP connection, called also Sockets are the endpoint of the bidirectional inter-process communication flows across the internet.
TCP port number are basically extension of the process(eg: 10.10.10.10:80). There is a maximum of 65535 ports
There are some common TCP ports such as:
- [21] FTP = File tranfer
- [22] SSH = Secure login
- [23] Telnet = Login
- [25] SMTP = Mail
- [53] DNS = Domain Name
- [80] HTTP
- [109/110] POP = Mail Retrivial
- [143/220/993] IMAP = Mail Retrivial
~Sockets in Python
Python as built-in support for TCP sockets
~import sockets #You need to import the library
~mysock = socket.socket(socket.AF_INET, socket.sock_stream) #Create a socket
~mysock.connect( (‘data.py4e.org’, 80) ) #Extend the socket (it can have no answer)
~Application Protocol
Application Layer1 –> Transport Layer1 —-> Transport Layer2 –> Application Layer2 [for transferring data]
Using HTTP (Read read the RFC to know the rules of the protocol)
~Writing a Web Browser
~import socket
~
~mysock = socket.socket(socket.AF_INET, socket.sock_stream)
~mysock.connect( (‘data.py4e.org’, 80) )
~cmd = ‘GET http://data.py4e.org/romeo.txt HTTP/1.0\n\n’.encode() #specified data to go accross
~mysock.send(cmd) #send data
~
~while True: #start receiving data
~ data = mysock.recv(512) #at 512 characters per block
~ if (len(data) < 1):
~ break #if data less than 1 means it’s done so break
~ print(data.decode()) #print the decoded version of the data
~mysock.close() #close the connection
~Character and String
Till now we used the ASCII (American Standard Code for Information Interchange) , which uses Latin characters.
It’s 1Byte or 8bits per character, and they are sorted by order.
To check the number value associate with each character (https://en.wikipedia.org/wiki/ASCII):
print(ord(‘H’)) #72 H was assigned with number 72
Different from ASCII we have UNICODE which can represent every character on the world.
the problem is that uses UTF-32 Fix length for each character (4bytes or 32bits).
They were able to compressed down to UTF-16 fixed length (2bytes) although still too large.
And finally we have UTF-8 with uses dynamic length, from 8 to 32bits. Because of that it also compatible with ASCII and is the most used on the web.
Because we are sending bytes, when we are working with Python we need to encode (in UTF-8) data to send, and when we receive data we need to decode before printing them.
string –encode–> bytes || bytes –decode–> string
~urllib (easier HTTP)
~import urllib.request, urllib.parse, urllib.error #import libraries
~
~fhand = urllib.request.urlopen(‘http://data.py4e.org/romeo.txt’) #open, encode and loop
~for line in fhand:
~ print(line.decode().strip()) #decode the received data
you can also use it for .html to read web pages
~import urllib.request, urllib.parse, urllib.error
~
~fhand = urllib.request.urlopen(‘http://data.py4e.org/romeo.html‘)
~for line in fhand:
~ print(line.decode().strip())
~web scraping or crawling
Note: be carefull as not every site allow you to use a program to watch their website
Beautiful Soup is a program that crawl the pages. It compensate all the error can occurr by retriving webpages.
Note: Beutiful soup has to be installed ( https://pypi.python.org/pypi/beautifulsoup4 )
~import urllib.request, urllib.parse, urllib.error #import libraries
~from bs4 import BeautifulSoup #import BS
~
~url = input(‘Enter -‘) #url to crawl
~html = urllib.request.urlopen().read() #open and read the html
~soup = BeautifulSoup(html, ‘html.parser’) #retrive data trough BS
~
~tags = soup(‘a’) #retrive anchor tags
~for tag in tags:
~ print(tag.get(‘href’, None))
#13 – Using Web Servicies
[7:24:10]
To be continued............