119 KiB
Google Interview University
What is it?
This is my multi-month study plan for going from web developer (self-taught, no CS degree) to Google software engineer.
This long list has been extracted and expanded from Google's coaching notes, so these are the things you need to know. There are extra items I added at the bottom that may come up in the interview or be helpful in solving a problem. Many items are from Steve Yegge's "Get that job at Google" and are reflected sometimes word-for-word in Google's coaching notes.
Table of Contents
- What is it?
- Why use it?
- How to use it
- Get in a Googley Mood
- Did I Get the Job?
- Follow Along with Me
- Don't feel you aren't smart enough
- About Google
- About Video Resources
- Interview Process & General Interview Prep
- Pick One Language for the Interview
- Before you Get Started
- What you Won't See Covered
- Prerequisite Knowledge
- The Daily Plan
- Algorithmic complexity / Big-O / Asymptotic analysis
- Data Structures
- More Knowledge
- Trees
- Sorting
- Graphs
- Even More Knowledge
- Final Review
- Books
- Coding exercises/challenges
- Once you're closer to the interview
- Your Resume
- Be thinking of for when the interview comes
- Have questions for the interviewer
- Once You've Got The Job
---------------- Everything below this point is optional ----------------
- Additional Learning
- Unicode
- Endianness
- Emacs and vi(m)
- Unix command line tools
- Information theory
- Parity & Hamming Code
- Entropy
- Cryptography
- Compression
- Networking
- Computer Security
- Garbage collection
- Parallel Programming
- Design patterns
- Messaging, Serialization, and Queueing Systems
- Fast Fourier Transform
- Bloom Filter
- van Emde Boas Trees
- Augmented Data Structures
- Skip lists
- Network Flows
- Disjoint Sets & Union Find
- Math for Fast Processing
- Treap
- Linear Programming
- Geometry, Convex hull
- Discrete math
- Machine Learning
- Go
- Additional Detail on Some Subjects
- Video Series
- Computer Science Courses
Why use it?
I'm following this plan to prepare for my Google interview. I've been building the web, building services, and launching startups since 1997. I have an economics degree, not a CS degree. I've been very successful in my career, but I want to work at Google. I want to progress into larger systems and get a real understanding of computer systems, algorithmic efficiency, data structure performance, low-level languages, and how it all works. And if you don't know any of it, Google won't hire you.
When I started this project, I didn't know a stack from a heap, didn't know Big-O anything, anything about trees, or how to traverse a graph. If I had to code a sorting algorithm, I can tell ya it wouldn't have been very good. Every data structure I've ever used was built into the language, and I didn't know how they worked under the hood at all. I've never had to manage memory, unless a process I was running would give an "out of memory" error, and then I'd have to find a workaround. I've used a few multi-dimensional arrays in my life and thousands of associative arrays, but I've never created data structures from scratch.
But after going through this study plan I have high confidence I'll be hired. It's a long plan. It's going to take me months. If you are familiar with a lot of this already it will take you a lot less time.
How to use it
Everything below is an outline, and you should tackle the items in order from top to bottom.
I'm using Github's special markdown flavor, including tasks lists to check progress.
- Create a new branch so you can check items like this, just put an x in the brackets: [x]
More about Github-flavored markdown
Get in a Googley Mood
Print out a "future Googler" sign (or two) and keep your eyes on the prize.
Did I Get the Job?
I haven't applied yet.
I still have a few days in the learning phase (finishing up this crazy list), and starting next week all I'll be doing is programming questions all day long. That will continue for a few weeks, and then I'll apply through a referral I've been holding onto since February (yes, February).
Thanks for the referral, JP.
Follow Along with Me
I'm on the journey, too. Follow along on my blog at GoogleyAsHeck.com
- Twitter: @googleyasheck
- Twitter: @StartupNextDoor
- Google+: +Googleyasheck
- LinkedIn: johnawasham
Don't feel you aren't smart enough
- Google engineers are smart, but many have an insecurity that they aren't smart enough, even though they work at Google.
- The myth of the Genius Programmer
About Google
- For students - Google Careers: Technical Development Guide
- How Search Works:
- Series:
- Book: How Google Works
- Made by Google announcement - Oct 2016 (video)
About Video Resources
Some videos are available only by enrolling in a Coursera, EdX, or Lynda.com class. These are called MOOCs. It is free to do so, but sometimes the classes are not in session so you have to wait a couple of months, so you have no access.
I'd appreciate your help converting the MOOC video links to public sources to replace the online course videos over time. I like using university lectures.
Interview Process & General Interview Prep
-
Videos:
-
Articles:
- Becoming a Googler in Three Steps
- Get That Job at Google
- all the things he mentions that you need to know are listed below
- (very dated) How To Get A Job At Google, Interview Questions, Hiring Process
- Phone Screen Questions
-
Additional (not suggested by Google but I added):
- ABC: Always Be Coding
- Four Steps To Google Without A Degree
- Whiteboarding
- How Google Thinks About Hiring, Management And Culture
- Effective Whiteboarding during Programming Interviews
- Cracking The Coding Interview Set 1:
- How to Get a Job at the Big 4:
- Failing at Google Interviews
Pick One Language for the Interview
I wrote this short article about it: Important: Pick One Language for the Google Interview
You can use a language you are comfortable in to do the coding part of the interview, but for Google, these are solid choices:
- C++
- Java
- Python
You could also use these, but read around first. There may be caveats:
- JavaScript
- Ruby
You need to be very comfortable in the language, and be knowledgeable.
Read more about choices:
- http://www.byte-by-byte.com/choose-the-right-language-for-your-coding-interview/
- http://blog.codingforinterviews.com/best-programming-language-jobs/
- https://www.quora.com/What-is-the-best-language-to-program-in-for-an-in-person-Google-interview
You'll see some C, C++, and Python learning included below, because I'm learning. There are a few books involved, see the bottom.
Before you Get Started
This list grew over many months, and yes, it kind of got out of hand.
Here are some mistakes I made so you'll have a better experience.
1. You Won't Remember it All
I watched hours of videos and took copious notes, and months later there was much I didn't remember. I spent 3 days going through my notes and making flashcards so I could review (see below).
2. Use Flashcards
To solve the problem, I made a little flashcards site where I could add flashcards of 2 types: general and code. Each card has different formatting.
I made a mobile-first website so I could review on my phone and tablet, whereever I am.
Make your own for free:
- Flashcards site repo
- My flash cards database: Keep in mind I went overboard and have cards covering everything from assembly language and Python trivia to machine learning and statistics. It's way too much for what's required by Google.
Note on flashcards: The first time you recognize you know the answer, don't mark it as known. You have to see the same card and answer it several times correctly before you really know it. Repetition will put that knowledge deeper in your brain.
3. Review, review, review
I keep a set of cheatsheets on ASCII, OSI stack, Big-O notations, and more. I study them when I have some spare time.
Take a break from programming problems for a half hour and go through your flashcards.
4. Focus
There are a lot of distractions that can take up valuable time. Focus and concentration is hard.
What you won't see covered
This big list all started as a personal to-do list made from Google interview coaching notes. These are prevalent technologies but were not mentioned in those notes:
- SQL
- Javascript
- HTML, CSS, and other front-end technologies
The Daily Plan
Some subjects take one day, and some will take multiple days. Some are just learning with nothing to implement.
Each day I take one subject from the list below, watch videos about that subject, and write an implementation in: C - using structs and functions that take a struct * and something else as args. C++ - without using built-in types C++ - using built-in types, like STL's std::list for a linked list Python - using built-in types (to keep practicing Python) and write tests to ensure I'm doing it right, sometimes just using simple assert() statements You may do Java or something else, this is just my thing.
Why code in all of these? Practice, practice, practice, until I'm sick of it, and can do it with no problem (some have many edge cases and bookkeeping details to remember) Work within the raw constraints (allocating/freeing memory without help of garbage collection (except Python)) Make use of built-in types so I have experience using the built-in tools for real-world use (not going to write my own linked list implementation in production)
I may not have time to do all of these for every subject, but I'll try.
You can see my code here:
- [C] (https://github.com/jwasham/practice-c)
- [C++] (https://github.com/jwasham/practice-cpp)
- [Python] (https://github.com/jwasham/practice-python)
You don't need to memorize the guts of every algorithm.
Write code on a whiteboard, not a computer. Test with some sample inputs. Then test it out on a computer.
Prerequisite Knowledge
-
How computers process a program:
-
Compilers
-
How floating point numbers are stored:
Algorithmic complexity / Big-O / Asymptotic analysis
-
nothing to implement
-
Big O Notation (and Omega and Theta) - best mathematical explanation (video)
-
Skiena:
-
TopCoder (includes recurrence relations and master theorem):
-
If some of the lectures are too mathy, you can jump down to the bottom and watch the discrete mathematics videos to get the background knowledge.
Data Structures
-
Arrays
- Implement an automatically resizing vector.
- Description:
- Implement a vector (mutable array with automatic resizing):
- Practice coding using arrays and pointers, and pointer math to jump to an index instead of using indexing.
- new raw data array with allocated memory
- can allocate int array under the hood, just not use its features
- start with 16, or if starting number is greater, use power of 2 - 16, 32, 64, 128
- size() - number of items
- capacity() - number of items it can hold
- is_empty()
- at(index) - returns item at given index, blows up if index out of bounds
- push(item)
- insert(index, item) - inserts item at index, shifts that index's value and trailing elements to the right
- prepend(item) - can use insert above at index 0
- pop() - remove from end, return value
- delete(index) - delete item at index, shifting all trailing elements left
- remove(item) - looks for value and removes index holding it (even if in multiple places)
- find(item) - looks for value and returns first index with that value, -1 if not found
- resize(new_capacity) // private function
- when you reach capacity, resize to double the size
- when popping an item, if size is 1/4 of capacity, resize to half
- Time
- O(1) to add/remove at end (amortized for allocations for more space), index, or update
- O(n) to insert/remove elsewhere
- Space
- contiguous in memory, so proximity helps performance
- space needed = (array capacity, which is >= n) * size of item, but even if 2n, still O(n)
-
Linked Lists
- Description:
- C Code (video) - not the whole video, just portions about Node struct and memory allocation.
- Linked List vs Arrays:
- why you should avoid linked lists (video)
- Gotcha: you need pointer to pointer knowledge: (for when you pass a pointer to a function that may change the address where that pointer points) This page is just to get a grasp on ptr to ptr. I don't recommend this list traversal style. Readability and maintainability suffer due to cleverness.
- implement (I did with tail pointer & without):
- size() - returns number of data elements in list
- empty() - bool returns true if empty
- value_at(index) - returns the value of the nth item (starting at 0 for first)
- push_front(value) - adds an item to the front of the list
- pop_front() - remove front item and return its value
- push_back(value) - adds an item at the end
- pop_back() - removes end item and returns its value
- front() - get value of front item
- back() - get value of end item
- insert(index, value) - insert value at index, so current item at that index is pointed to by new item at index
- erase(index) - removes node at given index
- value_n_from_end(n) - returns the value of the node at nth position from the end of the list
- reverse() - reverses the list
- remove_value(value) - removes the first item in the list with this value
- Doubly-linked List
- Description (video)
- No need to implement
-
Stack
- Stacks (video)
- Using Stacks Last-In First-Out (video)
- Will not implement. Implementing with array is trivial.
-
Queue
- Using Queues First-In First-Out(video)
- Queue (video)
- Circular buffer/FIFO
- Priority Queues (video)
- Implement using linked-list, with tail pointer:
- enqueue(value) - adds value at position at tail
- dequeue() - returns value and removes least recently added element (front)
- empty()
- Implement using fixed-sized array:
- enqueue(value) - adds item at end of available storage
- dequeue() - returns value and removes least recently added element
- empty()
- full()
- Cost:
- a bad implementation using linked list where you enqueue at head and dequeue at tail would be O(n) because you'd need the next to last element, causing a full traversal each dequeue
- enqueue: O(1) (amortized, linked list and array [probing])
- dequeue: O(1) (linked list and array)
- empty: O(1) (linked list and array)
-
Hash table
-
Videos:
-
Online Courses:
-
implement with array using linear probing
- hash(k, m) - m is size of hash table
- add(key, value) - if key already exists, update value
- exists(key)
- get(key)
- remove(key)
-
More Knowledge
-
Binary search
- Binary Search (video)
- Binary Search (video)
- detail
- Implement:
- binary search (on sorted array of integers)
- binary search using recursion
-
Bitwise operations
- Bits cheat sheet - you should know many of the powers of 2 from (2^1 to 2^16 and 2^32)
- Get a really good understanding of manipulating bits with: &, |, ^, ~, >>, <<
- 2s and 1s complement
- count set bits
- round to next power of 2:
- swap values:
- absolute value:
Trees
-
Trees - Notes & Background
- Series: Core Trees (video)
- Series: Trees (video)
- basic tree construction
- traversal
- manipulation algorithms
- BFS (breadth-first search)
- MIT (video)
- level order (BFS, using queue) time complexity: O(n) space complexity: best: O(1), worst: O(n/2)=O(n)
- DFS (depth-first search)
- MIT (video)
- notes: time complexity: O(n) space complexity: best: O(log n) - avg. height of tree worst: O(n)
- inorder (DFS: left, self, right)
- postorder (DFS: left, right, self)
- preorder (DFS: self, left, right)
-
Binary search trees: BSTs
- Binary Search Tree Review (video)
- Series (video)
- starts with symbol table and goes through BST applications
- Introduction (video)
- MIT (video)
- C/C++:
- Binary search tree - Implementation in C/C++ (video)
- BST implementation - memory allocation in stack and heap (video)
- Find min and max element in a binary search tree (video)
- Find height of a binary tree (video)
- Binary tree traversal - breadth-first and depth-first strategies (video)
- Binary tree: Level Order Traversal (video)
- Binary tree traversal: Preorder, Inorder, Postorder (video)
- Check if a binary tree is binary search tree or not (video)
- Delete a node from Binary Search Tree (video)
- Inorder Successor in a binary search tree (video)
- Implement:
- insert // insert value into tree
- get_node_count // get count of values stored
- print_values // prints the values in the tree, from min to max
- delete_tree
- is_in_tree // returns true if given value exists in the tree
- get_height // returns the height in nodes (single node's height is 1)
- get_min // returns the minimum value stored in the tree
- get_max // returns the maximum value stored in the tree
- is_binary_search_tree
- delete_value
- get_successor // returns next-highest value in tree after given value, -1 if none
-
Heap / Priority Queue / Binary Heap
- visualized as a tree, but is usually linear in storage (array, linked list)
- Heap
- Introduction (video)
- Naive Implementations (video)
- Binary Trees (video)
- Tree Height Remark (video)
- Basic Operations (video)
- Complete Binary Trees (video)
- Pseudocode (video)
- Heap Sort - jumps to start (video)
- Heap Sort (video)
- Building a heap (video)
- MIT: Heaps and Heap Sort (video)
- CS 61B Lecture 24: Priority Queues (video)
- Linear Time BuildHeap (max-heap)
- Implement a max-heap:
- insert
- sift_up - needed for insert
- get_max - returns the max item, without removing it
- get_size() - return number of elements stored
- is_empty() - returns true if heap contains no elements
- extract_max - returns the max item, removing it
- sift_down - needed for extract_max
- remove(i) - removes item at index x
- heapify - create a heap from an array of elements, needed for heap_sort
- heap_sort() - take an unsorted array and turn it into a sorted array in-place using a max heap
- note: using a min heap instead would save operations, but double the space needed (cannot do in-place).
-
Tries
- Note there are different kinds of tries. Some have prefixes, some don't, and some use string instead of bits to track the path.
- I read through code, but will not implement.
- Notes on Data Structures and Programming Techniques
- Short course videos:
- The Trie: A Neglected Data Structure
- TopCoder - Using Tries
- Stanford Lecture (real world use case) (video)
- MIT, Advanced Data Structures, Strings (can get pretty obscure about halfway through)
-
Balanced search trees
-
Know least one type of balanced binary tree (and know how it's implemented):
-
"Among balanced search trees, AVL and 2/3 trees are now passé, and red-black trees seem to be more popular. A particularly interesting self-organizing data structure is the splay tree, which uses rotations to move any accessed key to the root." - Skiena
-
Of these, I chose to implement a splay tree. From what I've read, you won't implement a balanced search tree in your interview. But I wanted exposure to coding one up and let's face it, splay trees are the bee's knees. I did read a lot of red-black tree code.
- splay tree: insert, search, delete functions If you end up implementing red/black tree try just these:
- search and insertion functions, skipping delete
-
I want to learn more about B-Tree since it's used so widely with very large data sets.
-
AVL trees
- In practice: From what I can tell, these aren't used much in practice, but I could see where they would be: The AVL tree is another structure supporting O(log n) search, insertion, and removal. It is more rigidly balanced than red–black trees, leading to slower insertion and removal but faster retrieval. This makes it attractive for data structures that may be built once and loaded without reconstruction, such as language dictionaries (or program dictionaries, such as the opcodes of an assembler or interpreter).
- MIT AVL Trees / AVL Sort (video)
- AVL Trees (video)
- AVL Tree Implementation (video)
- Split And Merge
-
Splay trees
- In practice: Splay trees are typically used in the implementation of caches, memory allocators, routers, garbage collectors, data compression, ropes (replacement of string used for long text strings), in Windows NT (in the virtual memory, networking, and file system code) etc.
- CS 61B: Splay Trees (video)
- MIT Lecture: Splay Trees:
- Gets very mathy, but watch the last 10 minutes for sure.
- Video
-
2-3 search trees
- In practice: 2-3 trees have faster inserts at the expense of slower searches (since height is more compared to AVL trees).
- You would use 2-3 tree very rarely because its implementation involves different types of nodes. Instead, people use Red Black trees.
- 23-Tree Intuition and Definition (video)
- Binary View of 23-Tree
- 2-3 Trees (student recitation) (video)
-
2-3-4 Trees (aka 2-4 trees)
- In practice: For every 2-4 tree, there are corresponding red–black trees with data elements in the same order. The insertion and deletion operations on 2-4 trees are also equivalent to color-flipping and rotations in red–black trees. This makes 2-4 trees an important tool for understanding the logic behind red–black trees, and this is why many introductory algorithm texts introduce 2-4 trees just before red–black trees, even though 2-4 trees are not often used in practice.
- CS 61B Lecture 26: Balanced Search Trees (video)
- Bottom Up 234-Trees (video)
- Top Down 234-Trees (video)
-
B-Trees
- fun fact: it's a mystery, but the B could stand for Boeing, Balanced, or Bayer (co-inventor)
- In Practice: B-Trees are widely used in databases. Most modern filesystems use B-trees (or Variants). In addition to its use in databases, the B-tree is also used in filesystems to allow quick random access to an arbitrary block in a particular file. The basic problem is turning the file block i address into a disk block (or perhaps to a cylinder-head-sector) address.
- B-Tree
- Introduction to B-Trees (video)
- B-Tree Definition and Insertion (video)
- B-Tree Deletion (video)
- MIT 6.851 - Memory Hierarchy Models (video) - covers cache-oblivious B-Trees, very interesting data structures - the first 37 minutes are very technical, may be skipped (B is block size, cache line size)
-
Red/black trees
- In practice: Red–black trees offer worst-case guarantees for insertion time, deletion time, and search time. Not only does this make them valuable in time-sensitive applications such as real-time applications, but it makes them valuable building blocks in other data structures which provide worst-case guarantees; for example, many data structures used in computational geometry can be based on red–black trees, and the Completely Fair Scheduler used in current Linux kernels uses red–black trees. In the version 8 of Java, the Collection HashMap has been modified such that instead of using a LinkedList to store identical elements with poor hashcodes, a Red-Black tree is used.
- Aduni - Algorithms - Lecture 4 (link jumps to starting point) (video)
- Aduni - Algorithms - Lecture 5 (video)
- Black Tree
- An Introduction To Binary Search And Red Black Tree
-
-
N-ary (K-ary, M-ary) trees
- note: the N or K is the branching factor (max branches)
- binary trees are a 2-ary tree, with branching factor = 2
- 2-3 trees are 3-ary
- K-Ary Tree
- note: the N or K is the branching factor (max branches)
Sorting
-
Notes:
-
关于堆排序,请查看前文堆的数据结构部分。堆排序很强大,不过是非稳定排序。
-
斯坦福大学关于排序算法的视频:
-
Shai Simonson 视频, Aduni.org:
-
Steven Skiena 关于排序的视频:
-
加州大学伯克利分校(UC Berkeley) 大学课程:
-
- 归并排序:
-
- 快速排序:
-
实现:
- 归并:平均和最差情况的时间复杂度为 O(n log n)。
- 快排:平均时间复杂度为 O(n log n)。
- 选择排序和插入排序的最坏、平均时间复杂度都是 O(n^2)。
- 关于堆排序,请查看前文堆的数据结构部分。
-
有兴趣的话,还有一些补充 - 但并不是必须的:
图
图论能解决计算机科学里的很多问题,所以这一节会比较长,像树和排序的部分一样。
-
Yegge 的笔记:
- 有 3 种基本方式在内存里表示一个图:
- 对象和指针
- 矩阵
- 邻接表
- 熟悉以上每一种图的表示法,并了解各自的优缺点
- 宽度优先搜索和深度优先搜索 - 知道它们的计算复杂度和设计上的权衡以及如何用代码实现它们
- 遇到一个问题时,首先尝试基于图的解决方案,如果没有再去尝试其他的。
- 有 3 种基本方式在内存里表示一个图:
-
Skiena 教授的课程 - 很不错的介绍:
-
图 (复习和其他):
- 6.006 单源最短路径问题 (video)
- 6.006 Dijkstra 算法 (video)
- 6.006 Bellman-Ford 算法(video)
- 6.006 Dijkstra 效率优化 (video)
- Aduni: 图的算法 I - 拓扑排序, 最小生成树, Prim 算法 - 第六课 (video)
- Aduni: 图的算法 II - 深度优先搜索, 广度优先搜索, Kruskal 算法, 并查集数据结构 - 第七课 (video)
- Aduni: 图的算法 III: 最短路径 - 第八课 (video)
- Aduni: 图的算法. IV: 几何算法介绍 - 第九课 (video)
- CS 61B 2014 (从 58:09 开始) (video)
- CS 61B 2014: 加权图 (video)
- 贪心算法: 最小生成树 (video)
- 图的算法之强连通分量 Kosaraju 算法 (video)
-
完整的 Coursera 课程:
-
Yegge: 如果有机会,可以试试研究更酷炫的算法:
- Dijkstra 算法 - 上文 - 6.006
- A* 算法
-
我会实现:
- DFS 邻接表 (递归)
- DFS 邻接表 (栈迭代)
- DFS 邻接矩阵 (递归)
- DFS 邻接矩阵 (栈迭代)
- BFS 邻接表
- BFS 邻接矩阵
- 单源最短路径问题 (Dijkstra)
- 最小生成树
- 基于 DFS 的算法 (根据上文 Aduni 的视频):
- 检查环 (我们会先检查是否有环存在以便做拓扑排序)
- 拓扑排序
- 计算图中的连通分支
- 列出强连通分量
- 检查双向图
可以从 Skiena 的书(参考下面的书推荐小节)和面试书籍中学习更多关于图的实践。
更多知识
-
递归
- Stanford 大学关于递归 & 回溯的课程:
- 什么时候适合使用
- 尾递归会更好么?
-
动态规划
-
This subject can be pretty difficult, as each DP soluble problem must be defined as a recursion relation, and coming up with it can be tricky.
-
这一部分会有点困难,每个可以用动态规划解决的问题都必须先定义出递推关系,要推导出来可能会有点棘手。
-
我建议先阅读和学习足够多的动态规划的例子,以便对解决 DP 问题的一般模式有个扎实的理解。
-
视频:
- Skiena 的视频可能会有点难跟上,有时候他用白板写的字会比较小,难看清楚。
- Skiena: CSE373 2012 - 课程 19 - 动态规划介绍 (video)
- Skiena: CSE373 2012 - 课程 20 - 编辑距离 (video)
- Skiena: CSE373 2012 - 课程 21 - 动态规划举例 (video)
- Skiena: CSE373 2012 - 课程 22 - 动态规划应用 (video)
- Simonson: 动态规划 0 (starts at 59:18) (video)
- Simonson: 动态规划 I - 课程 11 (video)
- Simonson: 动态规划 II - 课程 12 (video)
- 单独的 DP 问题 (每一个视频都很短): 动态规划 (video)
-
Yale 课程笔记:
-
Coursera 课程:
-
-
组合 (n 中选 k 个) & 概率
- 数据技巧: 如何找出阶乘、排列和组合(选择) (video)
- 来点学校的东西: 概率 (video)
- 来点学校的东西: 概率和马尔可夫链 (video)
- 可汗学院:
- 课程设置:
- 视频 - 41 (每一个都短小精悍):
-
NP, NP-完全和近似算法
- 知道最经典的一些 NP 完全问题,比如旅行商问题和背包问题, 而且能在面试官试图忽悠你的时候识别出他们。
- 知道 NP 完全是什么意思.
- 计算复杂度 (video)
- Simonson:
- Skiena:
- 复杂度: P, NP, NP-完全性, 规约 (video)
- 复杂度: 近视算法 Algorithms (video)
- 复杂度: 固定参数算法 (video)
- Peter Norvik 讨论旅行商问题的近似最优解:
- 《算法导论》的第 1048 - 1140 页。
-
缓存
-
进程和线程
- 计算机科学 162 - 操作系统 (25 个视频):
- 视频 1-11 是关于进程和线程
- 操作系统和系统编程 (video)
- 进程和线程的区别是什么?
- 涵盖了:
- 进程、线程、协程
- 进程和线程的区别
- 进程
- 线程
- 锁
- 互斥
- 信号量
- 监控
- 他们是如何工作的
- 死锁
- 活锁
- CPU 活动, 中断, 上下文切换
- 现代多核处理器的并发式结构
- 进程资源需要(内存:代码、静态存储器、栈、堆、文件描述符、I/O)
- 线程资源需要(在同一个进程内和其他线程共享以上的资源,但是每个线程都有独立的程序计数器、栈计数器、寄存器和栈)
- Fork 操作是真正的写时复制(只读),直到新的进程写到内存中,才会生成一份新的拷贝。
- 上下文切换
- 操作系统和底层硬件是如何初始化上下文切换的。
- 进程、线程、协程
- C++ 的线程 (系列 - 10 个视频)
- Python 的协程 (视频):
系统设计以及可伸缩性,要把软硬件的伸缩性设计的足够好有很多的东西要考虑,所以这是个包含非常多内容和资源的大主题。需要花费相当多的时间在这个主题上。
- 计算机科学 162 - 操作系统 (25 个视频):
-
系统设计、可伸缩性、数据处理
- Yegge 的注意事项:
- 伸缩性
- 把大数据集提取为单一值
- 大数据集转换
- 处理大量的数据集
- 系统
- 特征集
- 接口
- 类层次结构
- 在特定的约束下设计系统
- 轻量和健壮性
- 权衡和折衷
- 性能分析和优化
- 伸缩性
- 从这里开始: HiredInTech:系统设计
- 该如何为技术面试里设计方面的问题做准备?
- 在系统设计面试前必须知道的 8 件事
- 算法设计
- 数据库范式 - 1NF, 2NF, 3NF and 4NF (video)
- 系统设计面试 - 这一部分有很多的资源,浏览一下我放在下面的文章和例子。
- 如何在系统设计面试中脱颖而出
- 每个人都该知道的一些数字
- 上下文切换操作会耗费多少时间?
- 跨数据中心的事务 (video)
- 简明 CAP 理论介绍
- Paxos 一致性算法:
- 一致性哈希
- NoSQL 模式
- OOSE: UML 2.0 系列 (video)
- OOSE: 使用 UML 和 Java 开发软件 (21 videos):
- 如果你对 OO 都深刻的理解和实践,可以跳过这部分。
- OOSE: 使用 UML 和 Java 开发软件
- 面向对象编程的 SOLID 原则:
- Bob Martin 面向对象的 SOLID 原则和敏捷设计 (video)
- C# SOLID 设计模式 (video)
- SOLID 原则 (video)
- S - 单一职责原则 | 每个对象的单一职责
- O - 开闭原则 | 生产环境里的对象应该为扩展做准备而不是为更改
- L - 里氏代换原则 | 基类和继承类遵循 ‘IS A’ 原则
- I - 接口隔离原则 | 客户端被迫实现用不到的接口
- D -依赖反转原则 | 减少对象里的依赖。
- 可伸缩性:
- 很棒的概述 (video)
- 简短系列:
- 可伸缩的 Web 架构和分布式系统
- 错误的分布式系统解释
- 实用编程技术
- Jeff Dean - 在 Goolge 构建软件系统 (video)
- 可伸缩系统架构设计介绍
- 使用 App Engine 和云存储扩展面向全球用户的手机游戏架构实践(video)
- How Google Does Planet-Scale Engineering for Planet-Scale Infra (video)
- 算法的重要性
- 分片
- Facebook 系统规模扩展实践 (2009)
- Facebook 系统规模扩展实践 (2012), "为 10 亿用户构建" (video)
- Long Game 工程实践 - Astrid Atkinson Keynote(video)
- 30 分钟看完 YouTuBe 7 年系统扩展经验
- PayPal 如何用 8 台虚拟机扛住 10 亿日交易量系统
- 如何对大数据集去重
- Etsy 的扩展和工程文化探究 Jon Cowie (video)
- 是什么造就了 Amazon 自己的微服务架构
- 压缩还是不压缩,是 Uber 面临的问题
- 异步 I/O Tarantool 队列
- 什么时候应该用近视查询处理?
- Google 从单数据中心到故障转移, 到本地多宿主架构的演变
- Spanner
- Egnyte: 构建和扩展 PB 级分布式系统架构的经验教训
- 机器学习驱动的编程: 新世界的新编程方式
- 日服务数百万请求的图像优化技术
- Patreon 架构
- Tinder: 推荐引擎是如何决定下一个你将会看到谁的?
- 现代缓存设计
- Facebook 实时视频流扩展
- 在 Amazon AWS 上把服务扩展到 1100 万量级的新手教程
- 对延时敏感的应用是否应该使用 Docker?
- AMP(Accelerated Mobile Pages)的存在是对 Google 的威胁么?
- 360 度解读 Netflix 技术栈
- 延迟无处不在 - 如何搞定它?
- 无服务器架构
- 是什么驱动着 Instagram: 上百个实例、几十种技术
- Cinchcast 架构 - 每天处理 1500 小时的音频
- Justin.Tv 实时视频播放架构
- Playfish's 社交游戏架构 - 每月五千万用户增长
- 猫途鹰架构 - 40 万访客, 200 万动态页面访问, 30TB 数据
- PlentyOfFish 架构
- Salesforce 架构 - 如何扛住 13 亿日交易量
- ESPN's 架构扩展
- 下面 『消息、序列化和消息系统』部分的内容会提到什么样的技术能把各种服务整合到一起
- Twitter:
- 更多内容可以查看视频部分的『大规模数据挖掘』视频系列。
- 系统设计问题练习:下面有一些指导原则,每一个都有相关文档以及在现实中该如何处理。
- 复习: HiredInTech 的系统设计
- cheat sheet
- 流程:
- 理解问题和范围:
- 在面试官的帮助下定义用例
- 提出附加功能的建议
- 去掉面试官认定范围以外的内容
- 假定高可用是必须的,而且要作为一个用例
- 考虑约束:
- 问一下每月请求量
- 问一下每秒请求量 (他们可能会主动提到或者让你算一下)
- 评估读写所占的百分比
- 评估的时候牢记 2/8 原则
- 每秒写多少数据
- 总的数据存储量要考虑超过 5 年的情况
- 每秒读多少数据
- 抽象设计:
- 分层 (服务, 数据, 缓存)
- 基础设施: 负载均衡, 消息
- 粗略的概括任何驱动整个服务的关键算法
- 考虑瓶颈并指出解决方案
- 理解问题和范围:
- 练习:
- Yegge 的注意事项:
-
论文
- 有 Google 的论文和一些知名的论文.
- 你很可能实在没时间一篇篇完整的读完他们。我建议可以有选择的读其中一些论文里的核心部分。
- 1978: 通信顺序处理
- 2003: The Google 文件系统
- 2012 年被 Colossus 取代了
- 2004: MapReduce: Simplified Data Processing on Large Clusters
- 大多被云数据流取代了?
- 2007: 每个程序员都应该知道的内存知识 (非常长,作者建议跳过某些章节来阅读)
- 2012: Google 的 Colossus
- 没有论文
- 2012: AddressSanitizer: 快速的内存访问检查器:
- 2013: Spanner: Google 的分布式数据库:
- 2014: Machine Learning: The High-Interest Credit Card of Technical Debt
- 2015: Continuous Pipelines at Google
- 2015: 大规模高可用: 构建 Google Ads 的数据基础设施
- 2015: TensorFlow: 异构分布式系统上的大规模机器学习
- 2015: 开发者应该如何搜索代码:用例学习
- 2016: Borg, Omega, and Kubernetes
-
测试
- 涵盖了:
- 单元测试是如何工作的
- 什么是模拟对象
- 什么是集成测试
- 什么是依赖注入
- James Bach 讲敏捷软件测试 (video)
- James Bach 软件测试公开课 (video)
- Steve Freeman - 测试驱动的开发 (video)
- 测试驱动的开发已死. 测试不朽。
- 测试驱动的开发已死? (video)
- 视频系列 (152 个) - 并不都是必须 (video)
- Python:测试驱动的 Web 开发
- 依赖注入:
- 如何编写测试
- 涵盖了:
-
调度
- 在操作系统中是如何运作的
- 在操作系统部分的视频里有很多资料
-
实现系统例程
- 理解你使用的系统 API 底层有什么
- 你能自己实现它们么?
-
字符串搜索和操作
- 文本的搜索模式 (video)
- Rabin-Karp (videos):
- Knuth-Morris-Pratt (KMP) 算法:
- Boyer–Moore 字符串搜索算法
- Coursera: 字符串的算法
终面
这一部分有一些短视频,你可以快速的观看和复习大多数重要概念。
这对经常性的巩固很有帮助。
综述:
排序:
书籍
Google Coaching 里提到的
阅读并做练习:
-
算法设计手册 (Skiena)
- 书 (Kindle 上可以租到):
- Half.com 是一个资源丰富且性价比很高的在线书店.
- 答案:
- 勘误表
read and do exercises from the books below. Then move to coding challenges (further down below) 一旦你理解了每日计划里的所有内容,就去读上面所列的书并完成练习,然后开始读下面所列的书并做练习,之后就可以开始实战写代码了(本文再往后的部分)
首先阅读:
然后阅读 (这本获得了很多推荐, 但是不在 Google coaching 的文档里):
- Cracking the Coding Interview, 6th Edition
- 如果你看到有人在看 "The Google Resume", 实际上它和 "Cracking the Coding Interview" 是同一个作者写的,而且后者是升级版。
附加书单
这些没有被 Google 推荐阅读,不过我因为需要这些背景知识所以也把它们列在了这里。
-
C Programming Language, Vol 2
-
C++ Primer Plus, 6th Edition
如果你有时间
-
Elements of Programming Interviews
- 如果你希望在面试里用 C++ 写代码,这本书的代码全都是 C++ 写的
- 通常情况下能找到解决方案的好书.
编码练习和挑战
一旦你学会了理论基础,就应该把它们拿出来练练。 尽量坚持每天做编码练习,越多越好。
编程问题预备:
编码练习平台:
当你临近面试时
- 搞定代码面试 (videos):
你的简历
- 10 条小贴士让你写出一份还算不错的简历
- 这是搞定面试的第一个关键步骤
当面试来临的时候
随着下面列举的问题思考下你可能会遇到的 20 个面试问题
每个问题准备 2-3 种回答
准备点故事,不要只是摆一些你完成的事情的数据,相信我,人人都喜欢听故事
- 你为什么想得到这份工作?
- 你解决过的最有难度的问题是什么?
- 面对过的最大挑战是什么?
- 见过的最好或者最坏的设计是怎么样的?
- 对某项 Google 产品提出改进建议。
- 你作为一个个体同时也是团队的一员,如何达到最好的工作状态?
- 你的什么技能或者经验是你的角色中不可或缺的?为什么?
- 你在某份工作或某个项目中最享受的是什么?
- 你在某份工作或某个项目中面临过的最大挑战是什么?
- 你在某份工作或某个项目中遇到过的最蛋疼的 Bug 是什么样的?
- 你在某份工作或某个项目中学到了什么?
- 你在某份工作或某个项目中哪些地方还可以做的更好?
问面试官的问题
我会问的一些:(可能我已经知道了答案但我想听听面试官的看法或者了解团队的前景):
- 团队多大规模?
- 开发周期是怎样的? 会使用瀑布流/极限编程/敏捷开发么?
- 经常会为 deadline 加班么? 或者是有弹性的?
- 团队里怎么做技术选型?
- 每周平均开多少次会?
- 你觉得工作环境有助于员工集中精力吗?
- 目前正在做什么工作?
- 喜欢这些事情吗?
- 工作期限是怎么样的?
当你获得了梦想的职位
我还能说些什么呢,恭喜你!
坚持继续学习。
得到这份工作只是一个开始。
*****************************************************************************************************
*****************************************************************************************************
Everything below this point is optional. These are my recommendations, not Google's.
By studying these, you'll get greater exposure to more CS concepts, and will be better prepared for
any software engineering job.
*****************************************************************************************************
*****************************************************************************************************
Additional Learning
-
Unicode
-
Endianness
- Big And Little Endian
- Big Endian Vs Little Endian (video)
- Big And Little Endian Inside/Out (video)
- Very technical talk for kernel devs. Don't worry if most is over your head.
- The first half is enough.
-
Emacs and vi(m)
- suggested by Yegge, from an old Amazon recruiting post: Familiarize yourself with a unix-based code editor
- vi(m):
- emacs:
-
Unix command line tools
-
Information theory (videos)
- Khan Academy
- more about Markov processes:
- See more in MIT 6.050J Information and Entropy series below.
-
Parity & Hamming Code (videos)
- Intro
- Parity
- Hamming Code:
- Error Checking
-
Entropy
- also see videos below
- make sure to watch information theory videos first
- Information Theory, Claude Shannon, Entropy, Redundancy, Data Compression & Bits (video)
-
Cryptography
- also see videos below
- make sure to watch information theory videos first
- Khan Academy Series
- Cryptography: Hash Functions
- Cryptography: Encryption
-
Compression
- make sure to watch information theory videos first
- Computerphile (videos):
- Compressor Head videos
- (optional) Google Developers Live: GZIP is not enough!
-
Networking (videos)
-
Computer Security
-
Garbage collection
-
Parallel Programming
-
Design patterns
- Quick UML review (video)
- Learn these patterns:
- strategy
- singleton
- adapter
- prototype
- decorator
- visitor
- factory, abstract factory
- facade
- observer
- proxy
- delegate
- command
- state
- memento
- iterator
- composite
- flyweight
- Chapter 6 (Part 1) - Patterns (video)
- Chapter 6 (Part 2) - Abstraction-Occurrence, General Hierarchy, Player-Role, Singleton, Observer, Delegation (video)
- Chapter 6 (Part 3) - Adapter, Facade, Immutable, Read-Only Interface, Proxy (video)
- Series of videos (27 videos)
- Head First Design Patterns
- I know the canonical book is "Design Patterns: Elements of Reusable Object-Oriented Software", but Head First is great for beginners to OO.
- Handy reference: 101 Design Patterns & Tips for Developers
-
Messaging, Serialization, and Queueing Systems
-
Fast Fourier Transform
-
Bloom Filter
- Given a Bloom filter with m bits and k hashing functions, both insertion and membership testing are O(k)
- Bloom Filters
- Bloom Filters | Mining of Massive Datasets | Stanford University
- Tutorial
- How To Write A Bloom Filter App
-
van Emde Boas Trees
-
Augmented Data Structures
-
Skip lists
- "These are somewhat of a cult data structure" - Skiena
- Randomization: Skip Lists (video)
- For animations and a little more detail
-
Network Flows
-
Disjoint Sets & Union Find
- Disjoint Set
- UCB 61B - Disjoint Sets; Sorting & selection (video)
- Coursera (not needed since the above video explains it great):
-
Math for Fast Processing
-
Treap
- Combination of a binary search tree and a heap
- Treap
- Data Structures: Treaps explained (video)
- Applications in set operations
-
Linear Programming (videos)
-
Geometry, Convex hull (videos)
-
Discrete math
- see videos below
-
Machine Learning
- Why ML?
- Google's Cloud Machine learning tools (video)
- Google Developers' Machine Learning Recipes (Scikit Learn & Tensorflow) (video)
- Tensorflow (video)
- Tensorflow Tutorials
- [Practical Guide to implementing Neural Networks in Python](using Theano)])http://www.analyticsvidhya.com/blog/2016/04/neural-networks-python-theano/)
- Courses:
- Great starter course: Machine Learning - videos only - see videos 12-18 for a review of linear algebra (14 and 15 are duplicates)
- Neural Networks for Machine Learning
- Google's Deep Learning Nanodegree
- Google/Kaggle Machine Learning Engineer Nanodegree
- Self-Driving Car Engineer Nanodegree
- Metis Online Course ($99 for 2 months)
- Resources:
- Great book: Data Science from Scratch: First Principles with Python: https://www.amazon.com/Data-Science-Scratch-Principles-Python/dp/149190142X
- Data School: http://www.dataschool.io/
-
Go
--
Additional Detail on Some Subjects
I added these to reinforce some ideas already presented above, but didn't want to include them
above because it's just too much. It's easy to overdo it on a subject.
You want to get hired in this century, right?
-
More 动态规划 (videos)
-
Advanced Graph Processing (videos)
-
MIT Probability (mathy, and go slowly, which is good for mathy things) (videos):
Video Series
Sit back and enjoy. "netflix and skill" :P
-
Excellent - MIT Calculus Revisited: Single Variable Calculus
-
Computer Science 70, 001 - Spring 2015 - Discrete Mathematics and Probability Theory
-
CSE373 - Analysis of Algorithms (25 videos)
-
UC Berkeley CS 152: Computer Architecture and Engineering (20 videos)
-
Carnegie Mellon - Computer Architecture Lectures (39 videos)
-
MIT 6.042J: Mathematics for Computer Science, Fall 2010 (25 videos)
-
MIT 6.050J: Information and Entropy, Spring 2008 (19 videos)
-
Stanford: Programming Paradigms (17 videos)