v8
V8 is Google’s open source high-performance JavaScript and WebAssembly engine, written in C++.
Loading...
Searching...
No Matches
pretenuring-propagation-reducer.h
Go to the documentation of this file.
1// Copyright 2023 the V8 project authors. All rights reserved.
2// Use of this source code is governed by a BSD-style license that can be
3// found in the LICENSE file.
4
5#ifndef V8_COMPILER_TURBOSHAFT_PRETENURING_PROPAGATION_REDUCER_H_
6#define V8_COMPILER_TURBOSHAFT_PRETENURING_PROPAGATION_REDUCER_H_
7
14#include "src/zone/zone.h"
15
17
18// This reducer propagates pretenuring (= allocations of Old objects rather than
19// Young objects) throughout the graph: if a young allocation is stored in an
20// old allocation, then we'll make it old instead. The idea being that 1) if an
21// object is stored in an old object, it makes sense for it for be considered
22// old, and 2) this reduces the size of the remembered sets.
23// For instance, if we have:
24//
25// a = Allocate(old)
26// b = Allocate(young)
27// c = Allocate(young)
28// b.x = c
29// a.x = b
30//
31// Then we'll make `b` and `c` allocate Old objects directly, because they'll be
32// stored to old objects (transitively for `c`, but it still counts).
33// On the other hand, if we have:
34//
35// a = Allocate(old)
36// b = Allocate(young)
37// c = Allocate(young)
38// a.x = c
39// b.c = a
40//
41// Then we'll allocate `c` Old as well (because it is stored in an old pointer),
42// but `b` will stay young, since it's not stored in an Old object (it contains
43// a pointer to an old object, but that's probably not a good reason to make it
44// old).
45//
46//
47// Implementation
48//
49// In a first phase, we iterate the input graph and create a directed graph
50// (called `store_graph_`) where each node points to the nodes stored in it (so,
51// an edge between `a` and `b` means that `b` is stored in `a`). We also collect
52// all of the old allocations in a separate list (`old_allocs_`). The
53// `store_graph_` of the first example above will thus be:
54//
55// a -----> b -----> c
56//
57// And the graph of the second example will be:
58//
59// a -----> c b -----> a
60//
61// (it contains two unconnected subgraphs)
62//
63// Then, in a second phase, we iterate the old allocations (`old_allocs_`), and
64// for each one, we do a DFS in `store_graph_`, marking all of the nodes we
65// encounter as old, and stopping on old nodes (which 1- prevents infinite loops
66// easily, and 2- is sound because all of the initially-old pointers are roots
67// of this phase). On the 2 examples above, `a` will be the old entry in
68// `old_allocs_`, so in both cases will do a single DFS starting from `a`. In
69// the 1st case, it's easy to see that this DFS will encounter `b` and `c`
70// (which will thus both become Old), while in the 2nd case, this DFS will only
71// reach `c` (which will thus become Old, while `b` won't be changed).
72//
73// To be more precise, we also record Phi inputs in `store_graph_`, so that if
74// we have something like:
75//
76// a = Allocate(old)
77// b = Allocate(young)
78// c = Allocate(young)
79// p = Phi(b, c)
80// a.x = p
81//
82// Then we oldify both `b` and `c`. In this case, `store_graph_` would be
83//
84// --------> b
85// /
86// a ----> p
87// \
88// --------> c
89//
90// Which means that in the second phase, we'll start a DFS on `a` (the only old
91// allocation), move to `p` (the only node reachable from `a`), and the oldify
92// `b` and `c` (which are reachable from `p`).
93//
94//
95// Limitation: when a Phi of old allocations is used as the left-hand side of a
96// Store where the value being stored is a young allocation, we don't oldify the
97// young allocation. For instance, we won't oldify `a` in this example:
98//
99// a = Allocate(young)
100// b = Allocate(old)
101// c = Allocate(old)
102// p = Phi(b, c)
103// p.x = a
104//
105// The reason being that the store_graph sturcture isn't well suited for this,
106// since an edge Phi->Node can mean either that Node is stored (via a StoreOp)
107// in Phi, or that Node is an input of Phi. The `store_graph_` for the example
108// above will thus look like:
109//
110// ------> b
111// /
112// p ------> a
113// \
114// ------> c
115//
116// In order to oldify `a`, we would need to register `p` in `old_allocs_`,
117// except that we should only do this when `p` is actually old, and we discover
118// that only in the second phase.
119// Consider for instance this more complex example:
120//
121// a = Allocate(old)
122// b = Allocate(young)
123// c = Allocate(young)
124// d = Allocate(young)
125// a.x = b
126// a.y = c
127// p = Phi(b, c)
128// p.x = d
129//
130// The graph will be:
131//
132// -----> b <-----
133// / \
134// a p -----> d
135// \ /
136// -----> c <-----
137//
138// And the only entry in `old_allocs_` will be `a`. During the DFS from `a`,
139// allocations `b` and `c` will be oldified. At this point, `p` will point to
140// edges to 2 old (`b` and `c`) and 1 young (`d`) nodes.
141// We could look at all Phis in `store_graph_` and consider one by one for being
142// roots of an oldifying DFS: if all of the inputs of a phi `p` (in the sense
143// OpIndex inputs in the input_graph) are Old, then start an oldifying DFS from
144// `p`. However, the worst case complexity would be something like O(n^2) where
145// `n` is the number of Phis in the graph (since we could end up checking all
146// Phis but only finding a single one that is old, but the DFS could make a
147// single other phi old, thus repeating the process). This complexity could be
148// made linear by maintaining additional datastructures on the side, but there
149// isn't much evidence that this optimization would be often useful in practice.
150
152 public:
160
161 void Run();
162
163 private:
164 void ProcessStore(const StoreOp& store);
165 void ProcessPhi(const PhiOp& phi);
166 void ProcessAllocate(const AllocateOp& allocate);
167
168 bool PushContainedValues(OpIndex base);
169 void OldifySubgraph(OpIndex old_alloc);
170
173
175 auto it = store_graph_.find(idx);
176 if (it != store_graph_.end()) return it->second;
177 return Create(idx);
178 }
179
181 DCHECK_EQ(store_graph_.count(idx), 0);
183 store_graph_.insert({idx, stored_items});
184 return stored_items;
185 }
186
188 auto it = store_graph_.find(idx);
189 if (it != store_graph_.end()) return it->second;
190 return nullptr;
191 }
192
196
197 // (see main comment at the begining of this file for the role of
198 // `store_graph_`)
199 // `store_graph_` contains mapping from OpIndex to vector<OpIndex>. If for an
200 // entry `a` it contains a vector `v`, it means that `a` has edges to all of
201 // the values in `v`.
203
204 // AllocateOp have an AllocationType field, which is set to kOld once they've
205 // been visited, thus ensuring that recursion ends. However, PhiOp don't have
206 // such a field. Thus, once we've visited a Phi, we store it in {old_phis_} to
207 // prevent revisiting it.
209
210 // Used in the final phase to do DFS in the graph from each old store. It
211 // could be a local variable, but we instead use an instance variable to reuse
212 // memory.
214};
215
216// Forward delcaration
217template <class Next>
219
220template <class Next>
221class PretenuringPropagationReducer : public Next {
222#if defined(__clang__)
223 // PretenuringPropagationReducer should run before MemoryOptimizationReducer
224 // (because once young allocations are marked for folding, they can't be
225 // oldified anymore). We enforce this by making PretenuringPropagationReducer
226 // run in the same phase as MemoryOptimizationReducer, but before.
228#endif
229
230 public:
231 TURBOSHAFT_REDUCER_BOILERPLATE(PretenuringPropagation)
232
233 void Analyze() {
235 Asm().modifiable_input_graph());
236 analyzer.Run();
237 Next::Analyze();
238 }
239};
240
241} // namespace v8::internal::compiler::turboshaft
242
243#endif // V8_COMPILER_TURBOSHAFT_PRETENURING_PROPAGATION_REDUCER_H_
T * insert(const T *pos, It first, It last)
T * New(Args &&... args)
Definition zone.h:114
ZoneAbslFlatHashMap< OpIndex, ZoneVector< OpIndex > * > store_graph_
#define TURBOSHAFT_REDUCER_BOILERPLATE(Name)
Definition assembler.h:823
#define DCHECK_EQ(v1, v2)
Definition logging.h:485