This page collects my thoughts on some of the courses I took at UBC as an undergraduate (B.Sc. in Combined Honours Computer Science and Statistics, 2013–2018) as well as my answers to some frequently asked questions regarding statistics, computer science and math courses. Prospective and current students interested in such courses may find the information here helpful when picking courses to take.
Disclaimer: courses and degree requirements have changed since I completed my undergraduate degree. Please see the academic calendar for the most up-to-date information. Any opinions expressed on this page are my own.
Contents:
Term 1 | Term 2 | |||
---|---|---|---|---|
Course | Note | Course | Note | |
CPSC 110 | BIOL 121 | |||
ENGL 112 | CPSC 121 | |||
MATH 104 | no longer offered1 | MATH 105 | no longer offered | |
PHYS 101 | no longer offered | PHYS 102 | no longer offered | |
PSYC 101 | arts elective2 | SCIE 113 |
1 Course is no longer offered (and is, in most cases, replaced by some new course)
2 Elective course taken to satisfy the arts requirement for a B.Sc.
There is not much to say about my first-year courses, which are fairly standard and structured to satisfy the lower-level requirements for a B.Sc. Note that a number of these courses no longer exist and have since been replaced by other new courses.
CPSC 110 (Computation, Programs, and Programming): a common question prospective CPSC students have is whether to take CPSC 110 or to take CPSC 103 (Introduction to Systematic Program Design) + CPSC 107 (Systematic Program Design). My impression is that CPSC 110 and CPSC 103 + CPSC 107 roughly cover the same topics. Neither of the options assume previous programming experience. CPSC 103 appears to be intended for non-CPSC students, with CPSC 107 being a bridging course that provides these students the option to take further CPSC courses if they decide to. Students who are intending to major in computer science should probably take CPSC 110 as the direct path. Another option is taking the challenge exam for CPSC 110 where obtaining the minimum grade allows you to earn the credits without needing to take the course.
SCIE 113 (First-Year Seminar in Science): this course satisfies 3 of the 6 communication credits required for a B.Sc. Note that SCIE 113 can only be taken when you have first-year standing. If you are planning to take SCIE 113, you should do so in the first year of your degree.
If you are not planning on doing anything over the summer, taking a course or two is a great way to lessen your workload during the winter terms. There are two summer terms that each span two months (May–Jun and Jul-Aug—each approximately half the period of that of a winter term). This means that you can expect the pacing and workload of a summer course to be roughly equivalent to that of two courses in the winter term. For this reason, I recommend taking no more than two courses in a single summer term.
CPSC 210 (Software Construction): for (prospective) CPSC students, taking CPSC 210 before your second year can grant you more flexibility with your second year schedule as CPSC 210 is a prerequisite course for CPSC 213 and CPSC 221. With CPSC 210 out of the way, you then have the option of taking CPSC 213 or CPSC 221 (or both) in first term. This is particularly helpful considering that students with second year standing register last across all undergraduate students. It is likely that your first choice of course sections will be full before you can register, and so the more options you have for backups, the better.
Term 1 | Term 2 | |||
---|---|---|---|---|
Course | Note | Course | Note | |
CPSC 221 | CPSC 213 | |||
KORN 102 | arts elective |
KORN 102 | arts elective |
|
MATH 200 | MATH 221 | |||
MATH 220 | PHIL 321A | arts elective | ||
STAT 200 | STAT 302 |
3 Course spans both Term 1 and Term 2
CPSC 221 (Basic Algorithms and Data Structures): students in statistics (or really any non-CPSC discipline) who are debating between taking CPSC 221 or CPSC 213 will likely want to take CPSC 221. CPSC 221 introduces the basics of algorithm analysis (in particular, big O notation) that will stick with you whenever you write code. Understanding these concepts is especially important when dealing with large dataset problems that are common in data science and machine learning where algorithm runtime can make or break a proposed solution. Upper-level CPSC courses related to these fields will also generally require CPSC 221 as a prerequisite. Another good reason to take CPSC 221 is because technical interviews will often include questions that draw on topics covered in the course.
MATH 200 (Calculus III) / MATH 221 (Matrix Algebra): students who are interested in statistics or machine learning should take both of these courses (or their equivalents). MATH 200 covers multivariate calculus, which is a prerequisite to probability (the foundation of all statistics methodology) and as well as to numerical optimization (the approach underlying many modern machine learning methods). MATH 221 covers basic linear algebra, which similarly is necessary for simplifying methods applied to problems in statistics/machine learning that involve more than one variable (which include basically any real problem in practice).
MATH 220 (Mathematical Proof): students who are interested in pursuing graduate studies in a discipline heavily based on mathematical theory (e.g., statistics, machine learning, algorithm research, etc.) should take MATH 220 or an honours math course at some point in their degree. Analogous to the role of empirical evidence in communicating scientific research, proofs are how one researcher convinces another that their mathematical theory is correct. Being able to understand and construct your own proofs is a must in relevant disciplines.
STAT 200 (Elementary Statistics for Applications) / STAT 302 (Introduction to Probability): students who are interested in taking upper-level statistics courses should take both STAT 200 and STAT 302. The combination of the two serves as a prerequisite for most of the other statistics courses in the department. Some majors such as computer science accept STAT 251 in place of STAT 200 + STAT 302. However, STAT 251 alone does not satisfy prerequisites for upper-level courses and so you should stick with STAT 200 + STAT 302 if you plan on taking more statistics courses. Note that most majors which require STAT 302 will also accept MATH 302 as a substitute. Both courses generally cover the same topics—the main difference between the two is the department offering the course and so consequently, the teaching faculty and emphasis on theory/applications will also differ.
CPSC 304 (Introduction to Relational Databases): both computer science and statistics students may find CPSC 304 useful for jobs in industry. Most if not all organizations that collect and store data will do so using a database. CPSC 304 teaches SQL (pronounced S-Q-L or “sequel”), which is the primary programming language used to interact with (relational) databases. Statistics students will also benefit from learning the query logic in SQL, which is applicable to general dataset manipulation tools that can be found across statistical software (such as in the tidyverse
package in R). SQL is generally not difficult to learn on your own, but learning it through CPSC 304 is a nice way to earn credits (and to learn about other aspects of databases, such as design and good practices).
Co-op (short for co-operative education) is an optional program where students take academic terms off to work full-time in paid jobs (ideally) related to what they are studying or what they hope to pursue. Students typically apply to co-op in their second or third year depending on their program and, if accepted, participate in a few workshops the following term, apply to jobs the next, and start their first co-op after securing a job. When I did co-op, we were required to complete four co-op terms, which for me were the winter and summer terms of my third year at UBC and the summer term of my fourth year at UBC. Note that co-op is open to both domestic and international students, and that there are fees for the workshops and each co-op term.
I personally found co-op helpful in that it simplified and streamlined much of the job-searching process that could have been overwhelming for someone who has no experience doing so. The co-op program maintains their own list of open job postings (though students are also free to find opportunities on their own). The program also includes co-ordinators who support the students in finding jobs, answer questions and provide guidance, and run skill-based workshops (e.g., writing resumes). Applying to jobs as part of the co-op program also offers an unspoken perk in that expectations of you as an employee are tempered, meaning that employers recognize that you are ultimately a student and that you are there to learn (co-op specific positions generally involve some funding from the government, and so subsidized labour is one way companies that recruit co-ops benefit from the program). In terms of the co-op fees, the compensation I received from my positions were enough to pay off the fees (and the remainder of the domestic tuition for my degree).
You can find the jobs I worked for co-op in my CV or on my LinkedIn.
Term 1 | Term 2 | |||
---|---|---|---|---|
Course | Note | Course | Note | |
CPSC 302 | CPSC 303 | |||
CPSC 310 | CPSC 320 | |||
CPSC 340 | elective4 | MATH 303 | ||
MATH 307 | STAT 305 | |||
MATH 320 | honours5 | STAT 306 |
4 Course taken to satisfy an upper-level restricted or free elective requirement
5 Course intended for students in honours degrees
CPSC 302 (Numerical Computation for Algebraic Problems) / CPSC 303 (Numerical Approximation and Discretization): students who are in quantitative disciplines and who are looking for a CPSC elective may want to consider CPSC 302 and/or CPSC 303. The two courses together cover various aspects of numerical computation, such as what issues we run into when we try to do math on a computer and what methods we could use to address these issues. CPSC 302 focuses on topics related to linear systems (e.g., conditioning, matrix decompositions) whereas CPSC 303 focuses on topics that involve discretization (e.g., integrals, differential equations).
CPSC 320 (Intermediate Algorithm Design and Analysis): CPSC 320 is a natural continuation of CPSC 221 and so the reasons for taking it are similar. One topic that is introduced in CPSC 320 is NP-hardness, which is occasionally referenced in the machine learning literature.
CPSC 340 (Machine Learning and Data Mining): students who are interested in machine learning will want to take CPSC 340. Note that the course has many prerequisite courses, all of which are typically strictly enforced due to the popularity of the course. I cannot comment on CPSC 330 (Applied Machine Learning) as the course did not exist during my undergrad, but my impression is that CPSC 340 goes into more depth about the theory whereas CPSC 330 remains fairly surface level and emphasizes applications. Students who are considering a career that potentially involves machine learning should prioritize CPSC 340. Also see this comparison of the two written by instructors who actually taught the courses. Note that CPSC 340 is cross-listed with CPSC 532M and so graduate students should take CPSC 532M instead for graduate credit.
MATH 303 (Introduction to Stochastic Processes): students who are interested in advanced statistics or machine learning should consider taking MATH 303. The course introduces Markov chains, which is the core of Markov chain Monte Carlo (MCMC) methods that are commonly used in Bayesian statistics and certain approaches in machine learning.
MATH 307 (Applied Linear Algebra): MATH 307 builds on the knowledge learned in MATH 221 and focuses on applications of the theory. Covered topics generally vary depending on the instructor but usually include splines and matrix decompositions. It is likely that you will encounter many of these topics at least once in your statistics or machine learning career and so it helps to have been introduced to them beforehand.
MATH 320 (Real Variables I): as with MATH 220, students who are considering graduate studies in a theory-heavy discipline should strongly consider taking MATH 320 and its follow-up courses (MATH 321, MATH 420, etc.). Many theoretical results introduced in the statistics and machine learning literature assume some knowledge of real analysis, and trying to understand these results without some background in analysis can be frustrating. While real analysis has a reputation of being notoriously difficult among undergraduate courses, it is one of those topics where the payoff is high (assuming you pursue graduate studies in a quantitative field). These days, I find that I often read a research paper and then wish that I had taken more math (analysis, measure theory, etc.) during my undergraduate years.
STAT 305 (Introduction to Statistical Inference) / STAT 306 (Finding Relationships in Data): students who are interested in machine learning should consider taking STAT 305 and STAT 306. Both courses are core courses in the statistics curriculum at UBC. STAT 305 builds on the idea of statistical inference that is discussed in STAT 200. Importantly, it is also one of the few undergraduate courses that introduces the Bayesian paradigm, an inference perspective that is quickly developing to be a necessity in any statistician or machine learning scientist’s toolkit. STAT 306 covers the basics of regression, which is the most common approach for analyzing relationships between two or more variables in practice.
See 2015–Summer 2016: Co-op for my thoughts on co-op.
Term 1 | Term 2 | |||
---|---|---|---|---|
Course | Note | Course | Note | |
CPSC 420 | elective | CPSC 404 | elective | |
MATH 340 | elective | CPSC 406 | elective | |
STAT 404 | CPSC 540 | elective |
||
STAT 406 | elective | STAT 443 | elective | |
STAT 460 | honours | STAT 461 | honours |
6 Graduate course
CPSC 404 (Advanced Relational Databases): whereas CPSC 304 focuses on using and interacting with databases (i.e., SQL), CPSC 404 focuses on the structural mechanisms underlying databases (e.g., hardware aspects, indexing, optimization, etc.). Students who are potentially interested in working as a database developer will find CPSC 404 useful. For students primarily interested in statistics and the related, other CPSC courses may be more relevant.
CPSC 406 (Computational Optimization): students who are interested in advanced statistics or machine learning should consider taking CPSC 406. CPSC 406 touches on various computational optimization techniques that are frequently used in modern methods in statistics and machine learning. There is a little overlap with CPSC 302 and CPSC 303, but the courses otherwise complement each other well. For students who are interested in these fields but have limited CPSC electives, I would prioritize CPSC 406 over CPSC 302 and CPSC 303.
CPSC 420 (Advanced Algorithms Design and Analysis): unlike CPSC 320, the topics covered in CPSC 420 tend to be less structured across years and vary according to the instructor. Topics covered when I took the course included cuckoo hashing, network flow, game theory and approximation algorithms among others. Students who enjoyed CPSC 320 may also enjoy CPSC 420. I will comment that while the content was interesting, the iteration of CPSC 420 that I took featured a module-based pass/fail grading scheme that I did not appreciate. I believe that the grading scheme was not revisited in recent iterations of CPSC 420, and so you may have a better experience with the course than I did.
CPSC 540 (Machine Learning): CPSC 540 is a continuation of CPSC 340 but focuses on more research-oriented topics. Note that CPSC 540 has been restructured since I took the course and is now cross-listed with CPSC 440. Details about CPSC 540 (as well as complete lecture slides) can be found on Mark Schmidt’s course page (linked is the 2020W iteration of CPSC 540, but I do not expect the content to change significantly between years). Also linked on the page is Geoff Roeder’s write-up that compares CPSC 340/CPSC 540 with several STAT courses.
MATH 340 (Introduction to Linear Programming): MATH 340 is all about linear programming, which deals with the problem of solving linear systems under constraints and has wide-reaching applications. The course itself is not the most exciting, however, and can be a bit dry. MATH 340 would be suitable for students looking for a relatively easy MATH elective.
STAT 404 (Design and Analysis of Experiments): STAT 404 is a core course in the statistics curriculum at UBC. As its name suggests, STAT 404 covers design of experiments, which addresses the question of how to design a controlled experiment that allows for meaningful statistical analysis while being data-efficient. Any graduate student who may need to design or analyze a study (e.g., experiments, clinical trials, etc.) will find STAT 404 to be useful. The course may also appeal to students intending on a career in UX/UI where A/B testing may be a common task on the job.
STAT 406 (Methods for Statistical Learning): students who are interested in machine learning may also want to take STAT 406. The topics covered in STAT 406 overlap with that of CPSC 340, but STAT 406 emphasizes more statistical perspectives such as bias-variance tradeoffs. The courses complement each other well and can be taken together.
STAT 443 (Time Series and Forecasting): students who are interested in applied statistics may want to consider STAT 443. STAT 443 introduces methods for analyzing time series data, which occur in many applications. A time series consists of data collected sequentially in time, and the temporal aspect introduces potential correlation between data points that violates the assumption of independent and identically distributed observations made by usual statistical methods.
STAT 460 (Statistical Inference I) / STAT 461 (Statistical Inference II): students who are interested in doing a graduate degree in statistics should strongly consider taking STAT 460 and STAT 461. STAT 460 and STAT 461 are the only honours courses in the statistics curriculum at UBC and are cross-listed with STAT 560 and STAT 561, respectively (if you take STAT 460 + STAT 461 and then do a statistics graduate degree at UBC, you do not need to take STAT 560 + STAT 561, which are typically degree requirements for the graduate programs). The two courses together provide a broad introduction to mathematical statistics and cover a wide range of topics which—depending on the instructor(s)—may include more rigorous treatments of topics already seen in other STAT courses (e.g., hypothesis testing, convergence) and topics generally not found in other undergraduate courses (e.g., resampling). Rumour has it that the department graduate admissions committee looks fondly on applicants who have completed STAT 460 + STAT 461 with good grades.
What other STAT courses would you recommend taking?
For students who intend on pursuing a career in statistics, STAT 450 (Case Studies in Statistics) can be considered a capstone course in which students work as consultants and provide statistical help to a real client with statistical problem. One important thing that is often not touched on in other undergraduate STAT courses is how a statistician works with others (particularly, non-statisticians). Through STAT 450, students experience the entire process: asking the client questions to understand their statistical problem, identifying possible methods that suitably address the client’s problem, and explaining the proposed methods to the client in a way that they can understand. Note that STAT 450 is a small course and so registration for the course may be competitive.
What are your thoughts on DSCI XXX or STAT 201/STAT 301?
I cannot comment on these courses as they did not exist when I was an undergraduate. My impression is that these courses would be of interest for students who are looking to pursue a career in statistics or general data science in industry. For students who are interested in machine learning and/or graduate studies, the priority should still be on traditional STAT/MATH/CPSC courses that build up your mathematical and computational skillset (though the data science courses will provide a nice complement to these traditional courses if you have elective credits).