2019-05-01
term: 2B

Software Engineering Principles, taken in Spring 2019.

meta

Software Engineering: collection of principles and practices, techniques, processes, tools… that aim to improve software quality, dev productivity, scalability to larger systems, evolvability, more hot words here…

website: student.cs.uwaterloo.ca/~cs247

Rob Hackman: r2hackman@uwaterloo.ca Monday 3:30-5:00 PM office hours DC 2128

Raghav Sethi: cs247@uwaterloo.ca MC 4065

Assignments (5% x 3) - 1 late chance Project (8% + 7%) - 1 late chance Midterm (20%) June 27th, 2019. 4:30 to 6:20 PM. Final (50%)

Course Overview


---


software engineering principles

c vs c++ : why don’t we use c for OOP?

c can do OOP, but:

Hence c isn’t as well suited for OOP as c++.

adt (abstract data types)

Abstract data types are user-defined types that bundles together: ranges of valid values or data and functions and operations over those values. ADTs are a means to an end, the client code helps define the ADT’s functionality and only needs to be aware of the interface of the ADT (not the implementation).

Abstracts some complex idea of data.

Use the compiler or definition to enforce the proper usage/rules of the ADT. i.e. a valid range of data

Scope (how ADTs tie into usage): ClientCode { ADT Interface { ADT Implementation } }

example:

#include <iostream>
using namespace std;
int main(){
	cout << "enter rational number (a/b) : ";

	//need to define default ctor for Rational class
	Rational r,p,q;

	//overload >> operator w istream&, return Rational&
	//needs to be ref because:
	//1. modify original value later
	//2. iostream can't be copied, won't compile by value
	cin >> r >> q;

	//overload + operator with Rationals, return by value
	//needs to be by value because new object result
	//if by ref: heap allocate rational ref, manually delete later???
	//no. that's a mem leak.
	p = q + r;

	//overload << operator w ostream, return const Rational&
	// const ref optimized by compiler, faster copy
	cout << q/r;

	//define copy ctor for rational adt
	Rational z{q};
}

Things to define:

does an adt have to be a class? nope.

For example, a struct in c can also implement an adt. Why use classes then?

do you always have a default constructor? nope.

Only if there are appropriate defaults to set. i.e. default birthday????? doesn’t make sense.

why use MIL (member initialization list)?

Need to use if fields or superclass doesn’t have a default constructor. For example, if a superclass has a constructor with params, it loses the default constructor. Then, you would need to specify the constructor using the member initialization list.

Object Construction Process in C++:

 class A {
 	public:
 		A(int x): ....{};
 }
 class B {
 	A a;
 	public: 
 		B(){ a = A(...); };
 }

What if the superclass’ default constructor does exist? A is constructed twice, and then is move assigned/copied to another “A” object into a

Another factor to consider: SPEED. If you don’t initialize properly (without MIL), your superclass gets default constructed and then you waste that time because you’ll just overwrite that default constructor in the body again.

aka you created an object you didn’t want, just to overwrite it again in the object’s constructor.

Just initialize A right the first time… don’t initialize an object multiple times.

exaggerated example:

class Foo {
	public: 
		Foo() : size{large}, x{new int[large]};
}

class Bar {
	Foo f;
	#ifdef FAST

	#else 
		Bar(size_t x){ //unfortunately initialized to random values
			Foo otherF{x};
			//reassign a shit ton of values copying things over... 
			f = otherF;	
			//time consuming
		}
}

defining the rational class

class Rational { 
public: 
	Rational();
	Rational(int num, int denum) throw (char const*); // can 
	explicit Rational (int num);
private:
	//functions to set/reset the Rational
}

Accessor: will not change state of object, only returns, getter

Mutator: can change content of class, setter, usually checks client-provided values, may throw error (use exceptions to guarantee)

constructor with MIL

Construction only happens once. MIL only for constructor. Initialization happens once, can’t use MIL for other functions.

attempt 1

class Rational{
	int num_, denum_;
	
	public: 
		Rational() : num_(0), denum_(0) {};
		Rational(int num_) : num_(0), denum_(0) {};
		Rational(int num_, int denum_) : num_(0), denum_(0) {};
}

Too many constructors, gross. Can replace all that with a ctor with parameters.

attempt 2

class Rational{
	int num_, denum_;
	
	public: 
		Rational(int num_ = 0, int denum_ = 0) : num_(num), 
		denum_(denum) {};
		// default parameters must be trailing
		// foo(int x, int y=s, int z) is NOT ok, 
		// can't put z as the last
		// becomes ambiguous, not formal language
}

copy constructor

Rational(const Rational& r) : num_(r.num), denum_(r.denum) {};

overloads on operators

You can define operators (functions with names as symbols) with custom functions. The only guideline is the function name: operator concat function_symbol.

i.e. the copy assignment operator is an overload on the “=” operator

Rational& operator=(const Rational& other){
	num_ = other.num_;
	denum_ = other.denum_;
	return *this;
}

Why is the return “this” ? To be able to chain copy assignments i.e. a = b = c = d = e;

We could copy, but if we know that it exists in this scope, then we should return by reference instead. Also because code like (p=a)++ exists. ??????????????

The first param is implicitly the Rational pointed by the this pointer for operators.

operator+: between 2 Rationals

attempt 1 : as a non-member function

This is a non-member function (outside of the Rational class). Hence, no access to numerator or denumerator.

Rational operator+(const Rational& LHS, const Rational& RHS) { 
	//const ref (2nd term) for speed optimization
	//is friend so we can access private fields
	Rational p{LHS.num_ * RHS.denum_ + RHS.num_, 
		RHS.num_ * LS.denum_, denum_ * RHS.denum_}
} 

We don’t want to release the information of the private fields, hence declare operator+() as a friend inside the class:

friend Rational operator+(const Rational&, const Rational&);

Argument names not needed in function prototype(?). Note that you can declare things many times, but can only define things once.

attempt 2 : member function

Rational operator+(const Rational& RHS) { 
	//const ref (2nd term) for speed optimization
	return Rational{num_ * RHS.denum_ + RHS.num_ * denum_,
	 	denum_ * RHS.denum_}
};

attempt 3 : member function again but with an int

Rational operator+(int x) { 
	//const ref (2nd term) for speed optimization
	return Rational{x * denum_ + num_, denum_}
};

Currently will not work because we’re missing Rational + int.

Rational p = r + 7; is defined Rational p = 7 + ris not defined

Need to define as a friend function??? but whyyy

friend Rational operator+(int x, const Rational&);

operator/: between 2 rationals

operator«: between ostream and a Rational

friend ostream& operator<< (ostream&, Rational&);
ostream& operator<< (ostream& out, const Rational& r){
	return out << r.num_ << "/" << r.denum_ ;
}

Needs to also be chained i.e cout << a << p << r; Hence returning the ostream reference.

operator»: between istream and a Rational

friend istream& operator>> (istream&, Rational&);
istream& operator>> (istream& in, Rational& r){
	return in >> r.num_ >> r.denum_;
}

Needs to also be chained i.e cin >> a >> p >> r; Hence returning the istream reference.

Do the involved variables have to be member functions? The input and output operators literally can’t be member functions.

In: ` cin » r ` and ` cout « r `, the type of the first argument is iostream. These argument are the streams, hence CAN’T be default as a member function.

note: Overloaded operators as member functions always take the self object as the first parameter. Hence, can’t implement » and « as member functions (they are implied)

Therefore: output and input CAN’T be member functions

In general, prefer to design non-member, non-friend functions

function overloading

allow to use the same function name but with different number of parameters/argument signature Different return type cannot validate a function overload

BAD overload:

int foo();
char foo();

default overloading

use to combine variants of functions

Note that default arguments MUST only appear in the function declaration

trailing parameters may have default values

once the default is used int he function call, all next arguments in call must be defaults

i.e.

Rational (int num, int denom = 1);

operator overloading

How to detect?? the signature of the operator (i.e. argument types, return type, const, value or ref different from default operator definition)

Best practice: don’t define anything unexpected i.e. operator= returns a boolean

non-member functions

Why does » and « need to be non-member functions? Should be, since the first operand is a reference to the stream object

Best practice: the return value is modified so that the stream operations can be chained.

critical function of the ADT that is declared outside, to reduce ability to direct-access from the back

what is a friend

classes have private, public and protected friends

Data members should be private

you can provide accessors and mutators in the public fields instead

Whenever you use a friend keyword, that operator can access private fields, but NOT commutative (friend -> )

Friendship in c++ must be used carefully.

type conversion of ADT objects

use the explicit keyword to prevent compiler from using constructors that have one argument to perform implicit type conversion.

the explicit keyword:

Explicit disallows the compiler to implement implicit type conversions. If no explicit, then the compiler can implicitly casts integers to become Rational types.

The code won’t break if the dev passes an int, but then the bug would be much harder to be found.

Also useful for functions with more than one default argument.

helper functions

R-Value (return value)

The return value of foo() is guaranteed to not exist after the foo() statement. R-values in c++ are temporary values: lives in a register without an address.

R-values only appear on the right hand side.

Node foo() {
	Node p ....
	return p;
}

int main() { 
	Node n = foo();
}

Note here, you can’t call bar() on an r-value, because its type is an l-value reference of 5+3. Since 5+3 is stored in a register temporarily, no address so none to pass to bar().

void bar(int &x){
	... some assignment command idk
}
int main(){
	bar(5+3); //doesn't work
}

UNLESS 5+3 is const: the compiler will allow it by potentially creating a temporary address. Since it’s const, the compiler knows that you won’t mutate it, so the address can be created.

You can’t bind an r-value to an l-value.

L-Value (lifetime value)

The opposite of an r-value, it’s guaranteed to have a memory address and has a lifespan past the individual statement.

L-values can appear on the left hand side, except const l-value (can’t be assigned).

linked list by hand attempt 1

We want the client to do: Node n{3,new Node{2, new Node{}}};, which makes a linked list of 3->2->1.

Also needs this functionality:

Node p{n};
p.next->data = 20;
cout << p; //expected: 3, 20, 1
cout << n; //expected: 3, 2, 1

Tangent: what’s a string? it’s an array of char even in C++, just wrapped in a handy String class.

constructor attempt 1

We want deep copy, implemented using recursion. (Not going to worry about encapsulation rn). Why did we want deep copying? Lists need to be distinct. But is there any duplication? Nope, it’s guaranteed that the original copy will die after the line of creation. (r-value)

class Node {
public: 
	int data_;
	Node *next_;
	//constructor
	Node (int data, Node *next_ = nullptr); 
	//deep copy constructor with recursion
	Node (const Node& other) : data {other.data}, 
		next {other.next? new Node{*other.next} : nullptr};
	//base case == next is nullptr, and normal case
}

Knowing r-values, we actually don’t need to deep copy the list.

constructor attempt 2 (move constructor with shallow copy)

Sidenote: Node (Node &&o) is not a reference to a reference, it’s the ref of an r-value.

We have temporaries to copy (i.e. move constructor) but we don’t want to deep copy because that’s slow. (And don’t care for deep copying).

How else can we move it? Shallow copying the pointer.

Node (Node &&o): next{o.next}, data{o.data}{
	o.next = nullptr;
}

When you move o.next, it immediately dies and frees its list, so we reset o.next to a nullptr so that it doesn’t deallocate the list we just stole with the MIL. So instead, it’ll become a dangling pointer, it doesn’t delete the list immediately.

NOTE: that o is a reference to an r-value. But o as a parameter is not an r-value: o.data is an l-value.

what is data{o.data} ??? We don’t want to call the copy constructor because o is about to die and we don’t want to deep copy it. Let’s move o.

data{o.data} calls the string’s copy constructor. BUT we want to treat this thing as an r-value, because we know that it’s an r-value.

Solution: std::move overrides the compiler’s decision, swaps it like an r-value.

using std::move;
Node (Node &&o): next{o.next}, data{o.data}{
	o.next = nullptr;
}

copy assignment

copy assignment operator attempt 1

Node& operator= (const Node &other){
	data = other.data;
	delete next;
	next = new Node{other.next} //BUT new might fail
	return this;
}

Reasons why new would fail:

What’s wrong if new fails?? It’s not NULL, it’s a dangling pointer. If you dereference a dangling pointer, you won’t segfault, but you’ll get garbage instead :(

If new fails, we want to ensure that our Node doesn’t get mutated. How? Call new first.

copy assignment operator attempt 2

Node& operator= (const Node &other){
	Node &tmp = new Node{other};
	data = other.data;
	delete next;
	next = tmp;
	return this;
}

A failed new could be handled by exceptions now. Better to write this way.

First copy, then delete, also solves another issue. What happens with :

Node n ....;
n = n;

Self-assignment, assigning this object to reference. When you delete next, you’re freeing the memory for the next pointer

When you try to deep copy other’s next pointer, you

Trying to reference a dangling pointer is a memory error. The copy and swap idiom solves this issue, but you’re still doing a deep copy unnecessarily.

Solution: if self-assignment, return this. Two addresses are the same, if they reside in the same location in memory. Check for self-assignment by `if (this == &other) return *this;

copy assignment operator attempt 3: copy and swap idiom

Node& operator= (const Node &other){
	Node tmp{other};
	using std::swap;
	swap(data,other.data);
	swap(next,other.next);
	return *this;
}

First co

destructor (memory leak handling)

~Node() { delete next; }

Explicitly calling the destructor for next isn’t enough. How to deallocate the next Node?

Every time there is a new, there must be an equal and opposite delete.

When you delete a heap-allocate object, it goes out of scope and gets destructed. The idea of recursion calls each element next on the list, which handles the rest of the nodes.

At the end, delete the nullptr and then we’re good.

move assignment operator

move assignment operator attempt 1

Node& operator= (Node && other){
	data = other.data;
	next = other.next;
}

This would fail test cases. Once other goes out of scope, the list gets deleted.

move assignment operator attempt 2

Node& operator= (Node && other){
	data = other.data;
	delete next;
	next = other.next;

}

Memory leaks. Better than

move assignment operator attempt 3

#include <utility>

Node& operator= (Node && other){
	using std::swap;
	//within this scope, swap is from std
	swap(data,other.data);
	swap(next,other.next);
	return *this;

}

Why swap instead of assigning? Easier to implement, more memory efficient than copying. We reused swap, might as well create a private swapping function.

closing notes of the linked list

Client can break rules…

Node n {3, new Node{2, nullptr}}
{
	Node p{4, &n}
}

Error with constructing Node p! p will go out of scope when exiting the code block, and runs p’s destructor, which calls delete on the next pointer. BUT!!! the next pointer was a stack-allocated address, can’t call delete on that! Even if n is heap-allocated, then n is still going to end up as a dangling pointer, which is not good behavior.

Hence, we have some rules that we must uphold so that our ADT can work properly. Rules are a result of some assumptions we made when writing the code.

Invariant: assumptions or properties that must hold for the code to work properly. Assumption example: delete next; in destructor, (either assume that object is always a heap-allocate object, or it’s never a heap-allocated object).

You cannot trust that your client code will uphold your invariants. Also, they shouldn’t have to. Don’t expect the client to memorize all your rules.

inbugger is you.

summarizing the big 5

copy constructor, move constructor, copy assignment operator (cao), move assignment operator (mao), destructor.

General rule of thumb

If you have to provide a user-defined definition for one of these above, you should define all of them.

destructor

Use destructor when an object goes out of scope. If heap allocated, then when object’s delete is called. If stack allocated (statically allocated), when their enclosing scope ends.

class B {};
class A {
	B b;
}
A *p = new A;
delete p;

Object A pointed by p is dynamically allocated. Object B pointed by p is on the heap (created as part of A), is statically allocated. When *p dies, b dies as well.

copy constructor

If case 1, 2 use an r-value as the argument, then the move ctor is used as well.

The compiler provides built-in implementations of ctors: the copy ctor executes a byte-wise copying all plain old data PoD (int, char, bool, all basic types, pure structures) and copy constructs all object fields.

Doing a byte-wise copy on pointers is insufficient, because you’d copy the pointer’s address and not make a new object. Hence copy ctor was redefined for the linkedin.

The built-in move constructor also does byte-wise

The copy assignment operator byte-wise copying all PoD, and copy assigns all object fields.

You’ll lose efficiency if you only define the CAO, it will never use MAO because it’s not defined.

invariants

assumptions that must hold true for our code to work properly. It’s hard//impossible to reason about the code without them.

i.e. Node assumes ownership of the next Node, in order to be able to delete the next Node. In other words, next Node should be a heap-allocated object then

i.e. Linked List should not be cyclical

i.e. Node shouldn’t share the next pointer

^^^ currently we have no control over this.

Should not be the client’s responsibility to know and uphold the rules. Why can the client break these rules:

There is currently no checks in constructor, allows the client to select the next pointer (next Node), but we can’t guarantee that they use the right ones.

Allowing the client to use nodes at all is representation exposure

The ADT we wantto give to the client is a linked list, not Node pointers. The clients really shouldn’t know how the linked list is implemented (or all the invariants of the implementation).

THEREFORE: we want to encapusulate the implementation logic and hide it from the client

linked list by hand attempt 2

i.e. usage

List l;
l.push(3).push(2).push(1); 
//{1,2,3}
l.ith(i) = 5;
//list.h
class List {
	class Node {
		//private nested member
		//implement as before
	}
	Node *head;
public: 
	List() head {nullptr};
	List& push(int data);
	//Big 5 for lists:
	int& ith(int i);
}
//list.cc
List::Node::Node(int data, Node *next) ...
List& List::push(int data) {
	head = new Node { data, head };
	return *this;
}

Changing the header file: client code needs to recompile

Changing the cc file: client code doesn’t need to recompile???

reference is just part of a type signature

Giving integers by reference

int & List::ith(size_t i) {
	Node * node = head;
	while ( i > 0 && node){
		--i;
		node = node->next;
	}
	return ith->data;
	//crashes if node
}

Why is ith() bad??? Look at this client code:

#include <iostream>
using namespace std;

...
list l; 
size_t (....)
for (size_t i = 0; i < len; ++i){
	out << l.ith(i) << endl
}

ith() takes to iterate through a list…

Give user a similar way to iterate through the lists as they did before, but not expose the representation

What we want is an iterator:

See client code:

for (thing = beginning of list; thing != end of list; ++thing){
	*thing = *thing + 1;
	cout << *thing;
}

What do we need for an iterator??

1: get the beginning of list::iterator i.e. list::begin() 2: get the end of the list::iterator i.e. list::end() 3: be able to increment the list::iterator 4: implement the unary star operator (*) dereference operator 5: need inequality (and equality is nice to have )