Sim's Programming Learner's Checklist
Currently an early work in progress. This list sucks right now, and honestly, it's a lot harder to write than I thought, but I'm gonna keep trying to figure it out!
If you want a comprehensive background as a computing professional, you should look into everything in this checklist (unless it's optional or otherwise stated). This list is primarily targeted at self-learners who are not going the traditional route through university computer science and would like to be able to easily check whether the learning resources they find is on the right track.
Programming Fundamentals
Side-note: This section assumes your first programming language is an imperative language. If you don't know what I mean by that, then you are almost certainly using an imperative language. You will likely be told by your instructor/book if your language is a functional language.
- Add more content.
- Also talk about statically typed languages here?
- Cover the topic of threading and concurrency fundamentals. Also introduce higher-level concurrency abstractions from various languages/contexts, but don't go into too much depth.
- Cover the topic of automated testing (particularly unit testing). Briefly introduce TDD.
C Language Fundamentals
Chances are, the language you started with has made things very convenient for you. I think that every serious programmer should eventually try learning how to use C because C doesn't hold your hand (nearly as much). C will give you so much appreciation for all the bells and whistles other languages give you and give you a more critical understanding of algorithm and data structure performance. Additionally, good C programming requires the programmer to be aware of a few more very useful lower-level concepts, so C is a great context in which to learn about them.
C is also a great stepping stone into learning C++ and is somewhat necessary to learn about operating systems, digital circuits, computer architecture, compilers, and many security topics.
I very highly recommend learning C as your first language if you can, though it's understandably not for everyone because it's a very tedious language to work with. If you don't like it, there's no harm in holding it off for later when you're more motivated.
- COREQUISITES: Programming Fundamentals
- Know the concept of memory safety.
- Very briefly introduce assembly language and that C compilers compile down to it.
- Cover the topic of text encoding. Subtopics: ASCII and Unicode, null-terminated strings vs. length-prefixed strings, line feed and carriage return issues.
- Cover the topic of memory layout. Subtopics: segmentation, call stack, heap.
System-Level Fundamentals
While you might be able to get away with ignoring the system-level stuff, it gives you a stronger intuition around abstraction in computing while greatly expanding your toolset for solving problems in computing (such as optimizing your code's memory access patterns to improve performance).
- COREQUISITES: C Language Fundamentals
- Cover the topic of assembly and machine language.
- Cover the topic of processes. Subtopics: virtual memory, physical memory, caching, IPC, and executable binaries.
- Cover the topic of linking.
Essential Miscellany
This section is about all the random little things that you should know about that don't neatly fall under any of the other sections. Some of these topics are quite essential: everyone must know how to work with a terminal, use version control, and write regular expressions. Others, you might find you really like using in your day-to-day work, like vim and Docker.
But at the very least, you should be aware of everything in this section to keep at the back of your mind. It all gives you so much background behind everything to do with computers, and they expand your toolset for solving problems.
- PREREQUISITES: Programming Fundamentals
- Go through The Missing Semester of Your CS Education from MIT. Go through anything you're not yet familiar with.
- The three most important topics you need to be very familiar with are:
- The Linux/Mac/*nix command line and shell scripting
- NOTE: Even if you want to use Windows, I still think you should learn how to use the Linux/Mac command line. It's the nicest way to learn how to use a command line, and I almost guarantee you you will eventually be using a Linux or Mac box at some point.
- git
- NOTE: Except in certain rare circumstances (such as maybe game development where file sizes are huge and rapidly changing), git is your best friend in everything. And even if it's not, learning about git will give you much-needed experience into how to use version control systems.
- Regular expressions
- The Linux/Mac/*nix command line and shell scripting
- While you might not want to use vim, I still think everyone should be aware of what it's capable of, why it exists, why it's so useful, and how to exit vim.
- If you want to learn more about command-based text editors, also look into emacs.
- You absolutely need to be aware of profiling and how to use it to your advantage for optimizing your code's performance.
- For example, what is a hot spot, and how important are they in directing your code optimization efforts?
- The three most important topics you need to be very familiar with are:
- Cover the topic of security and cryptography fundamentals.
- Cover the topic of containerization (and Docker).
- Maybe cover the topic of makefiles and similar build tools. Or at least have it be an extension topic.
- Cover the topic of project management. Subtopics: agile/scrum/kanban, waterfall, RAD.
References and further reading:
- Computer Stuff They Didn't Teach You by Scott Hanselman
Mathematics
No, you technically don't need all that advanced university calculus and algebra, but there is some very unavoidable mathematics that you do need to be a great programmer.
- Add more content.
Data Structures and Algorithms
Core topics:
- PREREQUISITES: Programming Fundamentals; Mathematics
- I highly highly highly recommend doing data structures and algorithms exercises in the C programming language rather than something like Python, Java, or Javascript, because C forces you to work with more-primitive programming abstractions, which forces you to think very carefully about more aspects of each algorithm. I also recommend using a memory debugging tool such as valgrind when evaluating your C solutions in order to check for memory leaks.
- Understand big-O as a mathematical concept and why it's useful for run time/space analysis of data structures and algorithms.
- For every data structure and algorithm below, you must be able to apply big-O run time/space analysis to all important aspects of the algorithm, data structure, or data structure's operations.
- Compare and contrast linear search and binary search. Notably, what are the big-O time complexities of these algorithms? Which algorithm is faster?
- Sorting Algorithms
- Know how bubble sort, insertion sort, and selection sort work.
- Justify why these algorithms have O(n²) runtime complexity.
- Understand what it means for a sorting algorithm to be stable.
- For every sorting algorithm you study, you must be able to figure out whether or not it can be a stable sort, and if so, what design choices allow it to be stable.
- Understand and compare some common efficient comparison-sort algorithms: merge sort, quicksort, heapsort, and shellsort.
- What are their best-case and worst-case run time and space complexities? How do they compare to bubble sort, insertion sort, and selection sort?
- Quicksort has a worst-case complexity of O(n²), but has an average complexity of O(n log n). Under what conditions do different variants of quicksort produce its best and worst-case scenarios? What algorithm design choices allow us to avoid the O(n²) case?
- NOTE: A lot of sources on the internet say that quicksort is the fastest sorting algorithm in practice. I can't say for sure if it's true, but I think it's still worth looking into the claims into why it's the best. This should give you some perspective into different factors affecting performance (e.g memory caching) that is more nuanced than simply considering worst-case complexity.
- Understand some non-comparison sorting algorithms: radix sort and bucket sort.
- What makes them different to the comparison-sort algorithms?
- Have a brief look at bogosort and compare it against the other sorting algorithms you've seen. Look up the runtime complexity. Should it perform better or worse than bubble sort?
- Also have a look at sleep sort because it's kinda funny.
- Know how bubble sort, insertion sort, and selection sort work.
- Understand what an abstract data type (ADT) is and how it differs from a data structure.
- Appreciate how ADTs hide complexity behind a simple set of operations.
- NOTE: It's easiest to learn what an ADT is by examining a simple ADT like a stack and how it differs from the underlying data structure and the algorithms that implement its operations.
- Understand the structure and properties of singly linked lists and doubly linked lists, and how to implement efficient algorithms for many basic operations (such as reading, insertion, deletion, and list reversal).
- Compare and contrast linked lists with arrays. What are the advantages and disadvantages of each?
- Have a high-level understanding of how dynamic arrays can be implemented at a low-level. How might you implement insertion and deletion?
- NOTE: Dynamic arrays can be very difficult to appreciate if you aren't programming in something like C because a lot of modern programming languages (such as Python and Javascript) have dynamic arrays built into the language.
- Know the abstract properties and operations of the following ADTs: stack/LIFO, queue/FIFO, and deque.
- Know how to implement these ADTs using an array. Namely, know how to implement a circular buffer and why it's great for queues and deques.
- Know how to implement these ADTs using a linked list.
- Compare and contrast array and linked list implementations. How might you compare their time and space characteristics?
- NOTE: Comparing array and linked list implementations should give you great appreciation for the difference between ADT and underlying implementation, and it should be a very powerful lesson on the profound concept of abstraction.
- Why might it useful to use stack/queue/deque ADTs rather than directly use the underlying array or list? For example, how does it improve our ability to guarantee that our code is working correctly?
- Trees
- Binary Search Trees (BSTs)
- Understand the structure and properties of a binary search tree (BST) and the algorithms for some basic operations (such as search, insertion, deletion, counting elements, and tree-merging).
- Compare and contrast BSTs with arrays and linked lists.
- Understand how various self-balancing BSTs work, namely: splay trees, treaps, AVL trees, 2-3-4 trees, and red-black trees. Compare and contrast them against each other (and against simple unbalanced BSTs).
- NOTE: Feel free to skip learning about treaps until you've learnt about heaps.
- Heaps
- Understand the structure and properties of a heap and its significance in letting us implement an efficient priority queue ADT.
- Know how to implement the basic priority queue operations on a heap (insertion, pop/pull, and checking if the queue is empty).
- Understand the structure and properties of a trie, and identify some applications and why tries are useful for them.
- Appreciate the use of abstract syntax trees (ASTs) for representing the structure of code.
- Binary Search Trees (BSTs)
- Hash Tables
- Understand the structure and properties of a hash table and its significance in letting us implement efficient associative array (a.k.a. dictionaries) and set ADTs.
- Specifically, understand what a hash function is in the context of a hash table. What properties does this function have? (E.g. How does the output of the function relate to the table size? Does the same input always produce the same output?)
- What is a hash collision and collision resolution? How do hash collisions impact performance? How do we design the hash table and hash function in order to reduce the likelihood of this occurring?
- You should also have a look at some possible implementations of hash functions (both good and bad hash functions), and have a high-level understanding of how they work and how effective they are for hash tables.
- Graphs
- Know what graphs are, and familiarize yourself with the common terminology:
- vertex/node, edge, self-loop
- directed graph, undirected graph, weighted graph, multi-graph
- sparse graphs, dense graphs
- subgraphs, adjacent vertices, path
- cycle, connected graphs, complete graphs
- Compare and contrast common graph implementations: array of edges, adjacency list, adjacency matrix, and incidence matrix.
- Understand breadth-first search (BFS) and depth-first search (DFS). How do they differ?
- Understand Kruskal's algorithm and Prim's algorithm for calculating the minimum spanning tree. How do they differ?
- Understand Dijkstra's algorithm and A* for calculating the shortest path between two vertices/nodes.
- Understand how A*'s heuristic function works, and have an intuition around why it improves the performance of Dijkstra's algorithm.
- Why do we need the heuristic to be admissible?
- Also look into greedy best-first search. How does it behave? Why is this an incomplete solution for calculating shortest path?
- Have a look at some more famous graph problems. You don't necessarily need solve them. You just need to understand what the problem means and have a look at some implications and real-world applications for each problem.
- Hamilton path problem
- Euler path problem (and the Seven Bridges of Königsberg)
- Travelling salesman problem
- Vertex colouring problem
- How to find out if two graphs equivalent/isomorphic?
- How many connected subgraphs are there?
- Are two vertices in the same connected subgraph?
- Can a graph be drawn on a 2D plane without any edges crossing?
- Know what graphs are, and familiarize yourself with the common terminology:
Optional further topics that I recommend looking into to further expand your perspective on how these concepts can be applied:
- copy-on-write (COW)
- B-trees
- Skip lists
- Distributed Hash Tables (DHTs)
- Compression
- Huffman Coding
- LZW
- Lossy and lossless compression.
- Watch this video on the 100 Prisoners Riddle by Veritasium.
Optional advanced topics:
- Greedy Strategies
- Divide-And-Conquer Strategies
- Karatsuba Multiplication (for large integer multiplication)
- Fast Fourier Transform (FFT)
- Dynamic Programming
- Maximum Flow
- String Matching Algorithms
- The P=NP Problem
Want some programming practice? Here are my recommendations:
- You may need to create accounts on these websites to see the exercises.
- LeetCode's Data Structure Study Plan
- These exercises should be your priority to complete first because they cover the most fundamental things.
- As of November 2022, I see that the first two study plans are free to access, but the third study plan requires a paid subscription of $35 USD per month. I suggest just going through the free questions. You don't need the paid ones.
- Go through as many of these questions as you can. If you're having trouble with a question, skip it and do it later.
- HackerRank's Data Structures exercises
- Once you're done with all the free content from LeetCode's data structures study plan, you should have a look at HackerRank's data structures exercises. Feel free to skip over any exercises you've already done in LeetCode (or are too easy).
- Same advice as as with LeetCode data structures: Just go through as many questions as you can, and if you're stuck with a question, skip it and do it later.
- LeetCode's Algorithms exercises
- If data structures exercises are too hard or you're getting bored of them, I suggest just going through LeetCode's algorithms exercises. Feel free to skip anything you've already done.
- Same advice again: Just go through as many questions as you can, and if you're stuck with a question, skip it and do it later.
-
If you have money to blow, maybe check out AlgoExpert.
- It's quite expensive (as of November 2022, $99 USD for 1 year of access), but the one person I know who uses it likes it.
- In my opinion, the main value proposition of AlgoExpert is the fact that they include video lectures with the questions to help you along. The problem with LeetCode and HackerRank is that they don't really explain the questions and ideas in depth, so if you're stuck, you're left on your own to find relevant videos on YouTube or read the community forums/discussions.
- I don't consider it necessary to pay this much money for it, but it's up to you.
- Include screenshots of these websites to give people a better idea of what I'm referring to.
Object Oriented Programming
- PREREQUISITES: Programming Fundamentals
- Add more content.
Networking
- PREREQUISITES: Data Structures and Algorithms
- Add more content.
Web Development, Databases, and Web Security
- PREREQUISITES: Programming Fundamentals; Networking
- RECOMMENDED PREREQUISITES: Essential Miscellany
- Add more content.
Optional further reading:
- Watch Why web tech is like this by Steve Sanderson. It's a 1-hour summary of the history of web technology, and it's amazing.
Operating Systems
- PREREQUISITES: System-Level Fundamentals; Data Structures and Algorithms; Essential Miscellany
- Add more content.
[OPTIONAL] Pure Functional Programming Fundamentals
Not gonna lie, I haven't breached this topic myself, but I expect it to be great for expanding your understanding (and breaking down your assumptions) of what programming languages can look like and do. You can get away with never trying pure functional for your whole career, but it may be worth it anyway. Considering that this is a topic I'm unfamiliar with, take my comments on this with a grain of salt.
- Add more content.
[OPTIONAL] Further Mathematics
Going above and beyond with math is a great asset for a programmer, giving you practice in thinking abstractly, and giving you a foundation in modelling all sorts of phenomena. On top of that, some things really do require the advanced stuff, namely calculus and linear algebra for computer graphics work and machine learning.
- Add more content.
[OPTIONAL] Cloud, Containers, and Container Orchestration
Highly recommended to look into this if you're interested in web development.
- PREREQUISITES: Web Development, Databases, and Web Security
- RECOMMENDED PREREQUISITES: Essential Miscellany
- Add more content.
[OPTIONAL] Compilers and Interpreters
- Add more content.
[OPTIONAL] Artificial Intelligence and Machine Learning
- PREREQUISITES: Data Structures and Algorithms
- Add more content.
[OPTIONAL] Computer Graphics, Physics, and Game Development
- PREREQUISITES: Further Mathematics; Data Structures and Algorithms
- Add more content.
[OPTIONAL] Digital Circuits, Computer Architecture, and Custom Hardware Acceleration
I'm actually not entirely sure how to approach listing out for these topics. It probably may as well be its own big checklist, including topics on electrical circuit analysis and transistors. Feels too big to treat it as simply another section of this programmer's checklist.
[OPTIONAL] Programming Languages
Python
- PREREQUISITES: Programming Fundamentals
- COREQUISITES: Data Structures and Algorithms; Object Oriented Programming
- Core libraries you must know really well:
- itertools: General-purpose iterators.
- Have a skim through these core libraries:
- bisect: For inserting into a sorted list without needing to sort after every insertion.
- You don't need to memorize every single one of these libraries that deeply. Just skim through the documentation and understand the basic capabilities so that when you do encounter a use for it, you'll know what to look up!
- Add more content.
C++
Be warned: C++ is a massive beast of a language that's very different to most popular imperative languages such as Javascript or Python. While you can probably still go quite far without knowing how manual memory management works, neglect it for too long and you'll eventually meet perplexing bugs, mysteriously terrible performance, and deeply flawed software design. It is very easy to shoot yourself in the foot here. That's not to say you shouldn't learn C++ or Rust though because they take a very different approach to most mainstream languages that will greatly expand your understanding of what a programming language can do and the performance tradeoffs under the hood of many languages.
But of course, C++ isn't just an educational language. It's a highly practical language for writing high-performance software or highly-constrained computing (such as embedded systems), and is also quite widely used in game development and competitive programming. And once you get comfortable with the language, it's honestly 99% as easy to use as Java (at least, if we ignore importing, linking, and dependency management...). It's a very rewarding and fun language to get used to.
While I don't consider C to be a strict prerequisite, I very highly recommend learning C and doing a wide range of data structures and algorithms exercises in C before starting C++. Doing all sorts of HackerRank exercises and such is amazing for drilling in the most critical concepts in C++. But if you don't want to go the C route, you can still learn everything you need starting with C++ (and that knowledge should actually carry to C if you ever need to do C).
- PREREQUISITES: Data Structures and Algorithms; Object Oriented Programming; C Language Fundamentals
- Know pointers and references, and the differences between them.
- Know lvalue/rvalue, copy/move semantics, specifically copy/move constructors. Know how all these concepts relate to each other and the differences between them.
- Know the genenral overload resolution rules regarding lvalue/rvalue. For example, [I need to brush up on my C++ to come up with the proper wording for this lol]
- Understand std::unique_ptr and std::shared_ptr, how they improve memory safety, and how it relates to lvalue/rvalue and copy/move semantics.
- Add more content.
Acknowledgements
To help me compile this list, I borrowed from a bunch of sources:
- My own experience studying undergraduate Computer Engineering at UNSW
- OSSU Computer Science