Status Update
Comments
nd...@google.com <nd...@google.com> #2
ma...@google.com <ma...@google.com> #3
mk...@google.com <mk...@google.com> #4
BlobInfo info = new BlobInfoFactory().loadBlobInfo(blobKey);
// This just gives the user's filename like "cute-kitten.jpg":
However, I'm looking for the autogenerated filename that is created in google storage. I refreshed the cloud storage explorer and found the GS filename generated for this test was "L2FwcHMtdXBsb2FkL2Jsb2JzL2UzNUFBVE0tMEIxM1dwREFTc0t3RW9LOFE" and I can't find a way to get this.
ma...@google.com <ma...@google.com>
nd...@google.com <nd...@google.com> #5
ll...@google.com <ll...@google.com> #6
ll...@google.com <ll...@google.com> #7
so...@google.com <so...@google.com> #8
For example, it is impossible to add metadata headers to the uploaded files that are stored in GCS.
tf...@google.com <tf...@google.com> #9
I was responding to
di...@google.com <di...@google.com> #10
A few clarifications... Just trying to help out the next guy that hits this, since I've since moved from GAE to AWS...
1. The documentation is unclear if you call delete on a blobkey if it deletes the key/pointer or the actual file in GCS. For some reason I assumed that the delete removed the blobkey, but not the actual file.
2. When our application was written to work with GCS, we had a custom uploader that just put files in GCS based on a naming convention into the filestore. So to read the file we didn't have to know a "key" because it was just a user's ID and some other derived info.
When we switched to the official uploader that always returned a blobkey, we couldn't do that anymore. Without changing our code to store the blobkey we would have no way to read it. Perhaps the poster from
sw...@google.com <sw...@google.com> #11
However, this issue also adds a lot of confusion. Files are being uploaded to the GS (I can see them in GS Manager) but also appear in GAE app blobstore viewer. Some question arise here:
- will I be charged twice, for GAE app storage and GS storage space
- does it differ from the case when I put data to GS via curl -X PUT http://.../bucker/file in any way, i.e. is GS differs in any way by implementation from Blobstore, or its just a rebrended storage with new REST API to access it and different data namespace?
sw...@google.com <sw...@google.com> #12
Is it correct then, that to do named filed upload I have to use GCS rest api?
sw...@google.com <sw...@google.com> #14
tf...@google.com <tf...@google.com> #16
Second, when using blobstore.create_gs_key('/gs/bucket/filename') returns a string, I guess it's a key associated with the file exsiting on GCS but it's not documented how to use it. First, I thought it is used to retrieve/delete object in GCS, but it's not! I tried blobstore.delete('key') to delete object in GCS it doesn't do anything. Meanwhile, when using get_uploads('fieldname')[0] to retrieve a Blobinfo from form, you can get BlobKey from this, i.g.:
uploaded_file = self.get_uploads('fieldname')[0]
file_blobkey = uploaded_file.key()
Then I can use this blobkey to delete object in BlobViewer and GCS, then the second question is what does create_gs_key() do? How we gonna use it?
Third question is, how do I serve files from GCS through AppEngine without using AppEngine's bandwidth? I know ImageService can serve image, does it work for pdf/csv/word etc. as well?
Need help here!
Thanks
jc...@google.com <jc...@google.com> #17
1. enable libunwind (userspace-only solution)
2. use coresight for unwinding (userspace-only solution, iiuc)
3. have a local patch to allow kernel unwind clang binaries (kernel-based solution)
* For 1, I think libunwind causes too much overhead and isn't suitable for most of our use cases. When I tried it on an Intel device last time, the system slow down is quite noticeable with libunwind (i.e. `perf record --call-graph dwarf`). The perf data recorded with dwarf isn't consumable by our tools like `pprof` either.
* For 2, I don't know enough about coresight to comment. I tried playing with the CLs in c#19 and noticed that our kukui boards don't seem to support coresight. I'm also not seeing kernel stack traces from the perf data recorded with coresight, but I probably didn't use it correctly.
* For 3, the drawback is that CL like [1] won't work if the userspace binary is built with gcc so the solution is not upstreamable. Enabling fp also has some performance impact on 32-bit arm binaries. This solution probably has the best compatibility with our existing tools though.
[1]:
I wonder if it'd make sense to land [1] in our local kernel tree as a stopgap before we can move all our arm64 boards to 64-bit userspace. If most of our userspace binaries are built with clang and if we'll be able to do PGO with the perf data collected, perhaps we can mitigate some of the performance losses introduced by enabling fp. It'll also make most of our perf profiling and debugging work much easier, if we can get usable stack unwinding through perf tool.
sw...@google.com <sw...@google.com> #18
js...@google.com <js...@google.com> #19
js...@google.com <js...@google.com> #20
jc...@google.com <jc...@google.com> #21
I don't think libunwind will be too helpful in solving our problem either because it's not supported by our infra. We can't get symbolization on go/pprof and terasymbol for example and hence no symbolization on go/cwp, which is our source for ADFO profiles.
If we merge
[1]: go/chromeos64
js...@google.com <js...@google.com> #22
Will, Doug, Stephen, any thoughts on the upstream side of this?
di...@google.com <di...@google.com> #23
Yeah, this is probably OK even if it's not great. It sounds as if arm32 still has legs for a little while longer but at least there's an end in sight so there's not long term debt. I'm happy with this if there's no other alternative.
---
Looking at the patch, though, I'm wondering if we could actually be dynamic without too much work. I'm embarrassed to say that I'm mostly perf-dumb, so I might need some handholding before I can match what people have done before and test things. ...but I guess the big difference is:
---
clang:
struct compat_frame_tail {
compat_uptr_t fp; /* a (struct compat_frame_tail *) in compat mode */
u32 lr;
} __packed;
gcc:
struct compat_frame_tail {
compat_uptr_t fp; /* a (struct compat_frame_tail *) in compat mode */
u32 sp;
u32 lr;
} __attribute__((packed));
---
Wouldn't it be easy to:
a) Start by assuming it's the clang setup.
b) Look at the "lr"
c) If the memory pointed to by "lr" is _not_ executable then it can't really be the link register (which needs to point to code); switch to assuming gcc.
I'm talking out of my rear end here, but maybe the code would look something like this:
struct compat_frame_tail_clang *clang_tail = ...;
vma = find_vma(current->active_mm, clang_tail->lr);
if (!(vma->vm_flags & VM_EXEC)) {
struct compat_frame_tail_gcc *gcc_tail = ...;
// act on gcc_tail
} else {
// act on clang_tail
}
Would that be too much overhead, though? It looks like "find_vma" has a cache so it might be fast enough?
so...@google.com <so...@google.com> #24
di...@google.com <di...@google.com> #25
As a cheat, you might be able to fast path. If the LR is odd (has the low bit set) then I think you can immediately assume it's a valid LR and you don't need to check. This is because stack pointers (and frame pointers) can't be odd. ...but if the LR happens to point to Thumb code then (I believe) the LR will always be odd. Sounds as if userspace is usually running in Thumb mode, right, so this fast path would probably hit most of the time.
so...@google.com <so...@google.com> #26
di...@google.com <di...@google.com> #27
I think with this we could be dynamic for all cases except gcc-compiled thumb but that's impossible to crawl anyway. Unless I messed up, we've got this:
ARM APCS (current kernel code):
Summary: R11 is the FP and points just after the useful info.
```
+-----------+ 0x1000
| random |
+-----------+ 0x1004
| old FP |
+-----------+ 0x1008
| SP |
+-----------+
| LR |
+-----------+
| PC | <--- R11 (FP)
+-----------+
| random |
+-----------+
```
gcc (arm):
Summary: R11 is the FP and points to the old LR.
```
+-----------+ 0x1000
| random |
+-----------+ 0x1004
| old FP |
+-----------+ 0x1008
| LR | <--- R11 (FP)
+-----------+
| random |
+-----------+
```
gcc (thumb):
Summary: useless for unwinding
```
+-----------+ 0x1000
| random | <--- R7 (FP)
+-----------+ 0x1004
| ... |
+-----------+ ???
| random |
+-----------+ ???
| old FP |
+-----------+ ???
| LR |
+-----------+
| random |
+-----------+
```
clang (arm):
Summary: R11 is the FP and points to the old FP.
```
+-----------+ 0x1000
| random |
+-----------+ 0x1004
| old FP | <--- R11 (FP)
+-----------+ 0x1008
| LR |
+-----------+
| random |
+-----------+
```
clang (thumb):
Summary: R7 is the FP and points to the old FP.
```
+-----------+ 0x1000
| random |
+-----------+ 0x1004
| old FP | <--- R7 (FP)
+-----------+ 0x1008
| LR |
+-----------+
| random |
+-----------+
```
Assuming that's right:
* We have to figure the mode for each frame. That _should_ be possible. For the current frame there's a mode bit. For everything else I believe we can look at the low bit of the LR to figure out ARM vs. thumb.
* If it's thumb, we assume the clang way to go since the gcc one is useless.
* If it's ARM, we can dereference the FP. If that points to something "executable" we've got gcc-arm or APCS-arm. We can figure out which one by looking one word up and seeing if _that_ points to something executable.
* If we want to fast-path the ARM code a bit, we could say that if the FP is on the same "page" as a known SP then we know it can't be executable.
ma...@google.com <ma...@google.com> #28
di...@google.com <di...@google.com> #29
remote:
remote:
remote:
---
I've only tested with toy code (see attached "build.sh" and "testcase.c"). Does anyone have an easy way to get a version of Chrome compile with frame pointers to do more testing?
---
I'll probably do at least one more read-through of the code before posting it upstream. I also may see if I can improve my toy code anymore to see if any additional problems come up. If anyone wants to review it on gerrit I'd be more that happy to take feedback, too (especially the clang leaf function one).
ma...@google.com <ma...@google.com> #30
di...@google.com <di...@google.com> #31
Ah, OK. I'll create a tryjob with it.
di...@google.com <di...@google.com> #32
...now we'll see what upstream thinks.
---
I have confirmed that with Manoj's patch plus my kernel patch that I can actually see something with perf. I used the tryjob and then recorded 10 seconds with the command that Sonny provided me:
perf record -a -g -o perf_test -- sleep 10
I then took the output file "perf_test" and copied it to my desktop. There, I downloaded the debug symbol files and then symlinked them all to make perf happy:
for f in $(find . -name '*.debug'); do
ln -s $(basename "${f}") $(dirname "${f}")/$(basename "${f}" .debug) 2>/dev/null
done
With that, I could analyze:
perf report -g flat -i perf_test --symfs debug/ --kallsyms /build/trogdor/boot/System.map-*
---
I also made my toy code just a hair more complicated. Re-attached.
---
Nick actually pointed out that we're still not going to be perfect. Specifically he believes that Chrome OS still uses gcc to compile "libc" and compiles it in "Thumb" mode. That means we'll have problems tracing: chrome -> libc -> kernel. In theory a way forward for this (aside from using clang to compile libc):
1. Get a version of clang that uses R11 for Thumb code (which is planned)
2. Turn on the config option in my 3rd patch to account for this.
3. Build libc with gcc ARM instead of gcc Thumb.
I'll leave that to someone else.
---
I'm not a big user of perf, so hopefully someone who uses it more than I do will confirm that this makes things better for them.
ev...@google.com <ev...@google.com> #33
Here's a random sidenote in case the toolchain team is interested. In a former life I was also quite annoyed by GCC creating frames that couldn't be unwound via frame pointer, so I
- The file contains unrelated stuff as well, it was a single patch that contained all our changes for the OS at the time. But the frame stuff sticks out pretty obviously since it's the only non-cosmetic edit I made to GCC.
- I hid it behind a -mframe-chain option.
- I wasn't aware of the clang version (or maybe it hadn't been written yet), so it may not match exactly. I think I pointed the frame base to unwind similar to ARM APCS so that my unwinder code was consistent.
- I opted for a two-push approach rather than just moving the frame pointer to R11 because for functions that didn't use R8-R12 I wanted to keep the 16-bit push/pops. Moving to R11 had the cost of needing the 32-bit push/pop instructions just for R11, which I worried about but never actually measured. Maybe omitting frame pointers in leaf functions helps with the bulk of it.
I'm not sure the toolchain team would really want to be carrying around a random GCC patch for ARM32 frame pointers, but I thought might serve as an interesting curiosity :)
di...@google.com <di...@google.com> #34
FYI that -mno-omit-leaf-frame-pointer
not working forked to
di...@google.com <di...@google.com> #35
Will: I can poke the upstream thread to, but maybe less annoying here? I'm assuming that you or Catalin will eventually review the patches and provide feedback (or feel free to land them!). If that's not true and I need to seek review elsewhere, please let me know! :-)
ll...@google.com <ll...@google.com> #36
We have been pinging the ARM llvm compiler team about this for a while...
So, there has been some progress towards fixing the problem in the ABI specification so that GCC and Clang can agree:
"The Arm 32-bit ABI introduced specification for a frame pointer/frame chain in early 2020, see the subsection "6.1.2.4. The frame pointer" under
At the moment, that specification is not implemented in gcc nor llvm.
For llvm, one of the Arm compiler teams have it on their roadmap to implement in late summer this year.
For gcc, implementing this is not definitely planned in yet. I've pinged the Arm gcc team about this and will share if/when they will implement it. It looks like the earliest shipping version of gcc that could implement this is gcc-12, which is planned to be released in May 2022"
So, we have asked if Arm gcc team could provide a fix earlier. Waiting to hear on that.
In the meantime, we can try building glibc in ARM mode if that is going to help. Need to check the code size impact.. (should be ok).
I created
ll...@google.com <ll...@google.com> #37
di...@google.com <di...@google.com> #38
- He doesn't want to land the heuristics. Before landing something he wants agreement about what the future holds and implementations in both gcc and clang.
- Will suggested that this should land in arm32 before arm64 compat. That makes sense, but dealing getting arm32 patches landed historically has been less fun than getting teeth pulled and I'd rather not do it unless we really think this is going to move forward.
- Will suggested we could use userspace for unwinding. Specifically, he said:
In lieu of that, I think we must defer to userspace to unwind using DWARF. Perf supports this via PERF_SAMPLE_STACK_USER and PERF_SAMPLE_REGS_USER, which allows libunwind to be used to create the callchain. You haven't mentioned that here, so I'd be interested to know why not.
So I guess our options are:
- Keep ignoring this for the next year or so until our arm32 problem vanishes because NaCL is finally deprecated and we can switch fully to arm64.
- Land my patch as a private patch in the Chrome OS tree and then revert it / drop it when it's no longer needed. IMO this is probably fine to do since it's pretty isolated and isn't a long term support burden (we won't need it once we transition to fully arm64).
- Figure out how to use libunwind.
I'd tend to go with option #2 unless there are objections.
Before landing private patches for this, though, we need to make sure it'll actually be useful. That means someone from the toolchain team needs to:
- Figure out how to get glibc compiled in a way that it doesn't impede our tracing.
- Enable frame pointers on Chrome.
- Confirm that the performance impact of turning on frame pointers doesn't somehow kill everything.
I'm probably not the right person to drive any of that, since I can barely run perf
at all without someone whispering commands in my ear.
ll...@google.com <ll...@google.com> #39
(Still trying to understand the SLO notifications with buganizer)
bl...@google.com <bl...@google.com> #40
Automated by Blunderbuss job chromeos-toolchain-blunderbuss for component 1038090.
ma...@google.com <ma...@google.com>
ma...@google.com <ma...@google.com> #41
ma...@google.com <ma...@google.com> #42
di...@google.com <di...@google.com> #43
At this point should we just close this? If we haven't fixed it by now, it seems unlikely we'll fix it soon since many boards are transitioning to arm64...
js...@google.com <js...@google.com> #44
gb...@google.com <gb...@google.com> #45
The people have spoken(tm). If this is strongly needed for some reason in the future, please feel free to reopen.
Description
GCC and Clang currently differ in how they handle stack frames in thumb mode and Linux kernel tends to work only with GCC code.
This came up recently in
Upstream llvm bug (closed) with some details:
As a result, perf record "-g" i.e. with call graphs is broken on Thumb.
Wondering if we can do something regarding this inside the scope of clang+kernel work.