|
|
| Re: SPU problems with Linux 3 [message #691 is a reply to message #689] |
Wed, 29 February 2012 20:28   |
glevand
Messages: 955 Registered: July 2011 Location: SONY
|
Gitbrew God |
|
|
Having a similar problem on Linux 3.2.8
=====================================
[ BUG: bad unlock balance detected! ]
-------------------------------------
ppu_hello/1924 is trying to release lock (&sb->s_type->i_mutex_key) at:
[<d000000003773fd4>] .do_spu_create+0x84/0xd0 [spufs]
but there are no more locks to release!
other info that might help us debug this:
no locks held by ppu_hello/1924.
stack backtrace:
Call Trace:
[c000000006f7fa30] [c0000000000129a0] .show_stack+0x70/0x1a8 (unreliable)
[c000000006f7fae0] [c00000000008d36c] .print_unlock_inbalance_bug+0x114/0x120
[c000000006f7fb70] [c000000000092838] .lock_release+0x1d8/0x210
[c000000006f7fc20] [c0000000004568fc] .__mutex_unlock_slowpath+0xbc/0x228
[c000000006f7fcc0] [d000000003773fd4] .do_spu_create+0x84/0xd0 [spufs]
[c000000006f7fd70] [c000000000037754] .sys_spu_create+0xfc/0x1f0
[c000000006f7fe30] [c0000000000097d8] syscall_exit+0x0/0x40
------------[ cut here ]------------
kernel BUG at fs/dcache.c:474!
Oops: Exception in kernel mode, sig: 5 [#1]
SMP NR_CPUS=2 PS3
Modules linked in: spufs ps3_jupiter_sta btusb bluetooth ps3_jupiter snd_ps3 snd _pcm snd_timer ohci_hcd snd ehci_hcd sg usbcore ps3nflash ps3_disp_manager ps3_l pm usb_common soundcore ps3rom ps3flash rtc_ps3 snd_page_alloc
NIP: c00000000011b53c LR: c00000000011b52c CTR: c0000000000ae730
REGS: c000000006f7f930 TRAP: 0700 Not tainted (3.2.8)
MSR: 8000000000028032 <EE,CE,IR,DR> CR: 22080422 XER: 20000000
TASK = c000000006ebab40[1924] 'ppu_hello' THREAD: c000000006f7c000 CPU: 1
GPR00: c00000000011b52c c000000006f7fbb0 c0000000006dcac8 c000000005fe5b78
GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000002
GPR08: 0000000000000000 0000000000000001 c00000000011b52c 0000000000000000
GPR12: 0000000042080424 c000000007ffe280 0000000000000000 00000000100f0000
GPR16: 00000000100f5678 00000000100dc70c 00000000100f5820 00000000100f5790
GPR20: 00000000100f5598 0000000000000000 0000000000000000 00000000100f0000
GPR24: 0000000000000000 c000000006a3c690 0000000040000010 c0000000060d4080
GPR28: c000000005fe5b18 c000000005fe5b78 c00000000066f400 c000000005fe5b18
NIP [c00000000011b53c] .dput+0x54/0x288
LR [c00000000011b52c] .dput+0x44/0x288
Call Trace:
[c000000006f7fbb0] [c00000000011b52c] .dput+0x44/0x288 (unreliable)
[c000000006f7fc50] [c000000000101d90] .fput+0x1d8/0x2f0
[c000000006f7fd00] [c0000000000fd080] .filp_close+0xa8/0xe8
[c000000006f7fd90] [c0000000000fea84] .SyS_close+0xcc/0x130
[c000000006f7fe30] [c0000000000097d8] syscall_exit+0x0/0x40
Instruction dump:
fb81ffe0 fba1ffe8 91810008 f821ff61 418e0080 3bbf0060 7fa3eb78 4833c581
60000000 817f0058 7d690074 7929d182 <0b090000> 2b8b0001 419d013c 801f0000
---[ end trace 4684d6840129b57f ]---
glevand@debian:~/spu_hello$
Compiled and installed libspe2 from SVN: https://libspe.svn.sourceforge.net/svnroot/libspe
Tested SPU program from here: http://cellperformance.beyond3d.com/articles/2006/11/cross-c ompiling-for-ps3-linux-1.html
The program worked fine but upon closing SPU, the exception was raised.
I wonder if kernel devs are aware of it.
[Updated on: Wed, 29 February 2012 20:32] Report message to a moderator
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| Re: SPU problems with Linux 3 [message #724 is a reply to message #723] |
Fri, 02 March 2012 22:34   |
glevand
Messages: 955 Registered: July 2011 Location: SONY
|
Gitbrew God |
|
|
With this patch, one exception less 
--- arch/powerpc/platforms/cell/spufs/syscalls.c 2012-03-01 01:32:49.000000000 +0100
+++ arch/powerpc/platforms/cell/spufs/syscalls.c.new 2012-03-02 21:24:46.616534770 +0100
@@ -70,7 +70,6 @@
ret = PTR_ERR(dentry);
if (!IS_ERR(dentry)) {
ret = spufs_create(&path, dentry, flags, mode, neighbor);
- mutex_unlock(&path.dentry->d_inode->i_mutex);
dput(dentry);
path_put(&path);
}
[Updated on: Fri, 02 March 2012 22:42] Report message to a moderator
|
|
|
|
|
|
|
|
| Re: SPU problems with Linux 3 [message #728 is a reply to message #727] |
Sat, 03 March 2012 11:48   |
glevand
Messages: 955 Registered: July 2011 Location: SONY
|
Gitbrew God |
|
|
We need a volunteer to submit this patch to linux kernel before 3.3. release.
I cannot do it for obvious reasons I doubt they will accept anything from me, unlike freebsd. Too many SONY and IBM pussies are working on Linux powerpc.
[Updated on: Sat, 03 March 2012 12:03] Report message to a moderator
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| Re: again SPU problems with Linux 3 [message #1166 is a reply to message #1162] |
Fri, 27 April 2012 10:15   |
masterzorag
Messages: 99 Registered: August 2011 Location: maiworld
|
Gitbrew Member |
|
|
glevand wrote on Thu, 26 April 2012 18:30Great, another bug with SPU.
Did you apply my SPU patch ? Maybe we should try without it, maybe that's because of this patch. Just an idea.
without your patch, one step less...
=====================================
[ BUG: bad unlock balance detected! ]
3.3.3 #1 Not tainted
-------------------------------------
test/900 is trying to release lock (&sb->s_type->i_mutex_key) at:
[<d0000000005ec900>] .do_spu_create+0x90/0xd8 [spufs]
but there are no more locks to release!
other info that might help us debug this:
no locks held by test/900.
stack backtrace:
Call Trace:
[c0000000069d7a30] [c000000000011120] .show_stack+0x6c/0x16c (unreliable)
[c0000000069d7ae0] [c000000000084928] .print_unlock_inbalance_bug+0xec/0x114
[c0000000069d7b70] [c000000000086edc] .lock_release+0xd8/0x200
[c0000000069d7c10] [c000000000388dac] .__mutex_unlock_slowpath+0x11c/0x1d8
[c0000000069d7cb0] [d0000000005ec900] .do_spu_create+0x90/0xd8 [spufs]
[c0000000069d7d70] [c0000000000352b8] .SyS_spu_create+0x16c/0x1c8
[c0000000069d7e30] [c0000000000097d8] syscall_exit+0x0/0x40
------------[ cut here ]------------
Kernel BUG at c00000000010eb70 [verbose debug info unavailable]
Oops: Exception in kernel mode, sig: 5 [#1]
SMP NR_CPUS=2 NUMA PS3
Modules linked in: spufs dm_mod btusb bluetooth usb_storage snd_ps3 snd_pcm snd_timer snd ohci_hcd ps3_disp_manager soundcore ehci_hcd rtc_ps3 ps3vram ps3_lpm snd_page_alloc ps3flash usbcore usb_common [last unl
oaded: scsi_wait_scan]
NIP: c00000000010eb70 LR: c00000000010eb60 CTR: c0000000000a08cc
REGS: c0000000069d7930 TRAP: 0700 Not tainted (3.3.3)
MSR: 8000000000028032 <SF,EE,IR,DR,RI> CR: 22004822 XER: 20000000
TASK = c000000006a8e6c0[900] 'test' THREAD: c0000000069d4000 CPU: 1
GPR00: 0000000000000001 c0000000069d7bb0 c0000000005d5720 c0000000053c00d0
GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000002
GPR08: 0000000000000000 0000000000000000 c00000000010eb60 c000000006a8e6c0
GPR12: 0000000082004824 c000000007ffe280 0000000000000004 00000000f7891688
GPR16: 00000000f7871734 00000000f78927a4 00000000f7893008 00000000f78927a8
GPR20: 00000000ffa29300 000000000fd958a0 0000000000000000 000000000000000d
GPR24: 000000000fd98240 c00000000687a990 0000000040000010 c0000000060d34a0
GPR28: c0000000053c0070 c0000000053c00d0 c000000000576b60 c0000000053c0070
NIP [c00000000010eb70] .dput+0x48/0x214
LR [c00000000010eb60] .dput+0x38/0x214
Call Trace:
[c0000000069d7bb0] [c00000000010eb60] .dput+0x38/0x214 (unreliable)
[c0000000069d7c50] [c0000000000f6214] .fput+0x24c/0x288
[c0000000069d7d00] [c0000000000f21dc] .filp_close+0xbc/0xe4
[c0000000069d7d90] [c0000000000f22d4] .SyS_close+0xd0/0x128
[c0000000069d7e30] [c0000000000097d8] syscall_exit+0x0/0x40
Instruction dump:
fb61ffd8 fb81ffe0 fba1ffe8 f821ff61 418201c8 3bbf0060 7fa3eb78 4827d375
60000000 813f0058 7d200074 7800d182 <0b000000> 2b890001 409d0010 3809ffff
---[ end trace 6735bcf43efb12ff ]---
------------[ cut here ]------------
Kernel BUG at c00000000010eb70 [verbose debug info unavailable]
Oops: Exception in kernel mode, sig: 5 [#2]
SMP NR_CPUS=2 NUMA PS3
Modules linked in: spufs dm_mod btusb bluetooth usb_storage snd_ps3 snd_pcm snd_timer snd ohci_hcd ps3_disp_manager soundcore ehci_hcd rtc_ps3 ps3vram ps3_lpm snd_page_alloc ps3flash usbcore usb_common [last unloaded: scsi_wait_scan]
NIP: c00000000010eb70 LR: c00000000010eb60 CTR: c0000000000a08cc
REGS: c0000000069d6c10 TRAP: 0700 Tainted: G D (3.3.3)
MSR: 8000000000028032 <SF,EE,IR,DR,RI> CR: 22004822 XER: 20000000
TASK = c000000006a8e6c0[900] 'test' THREAD: c0000000069d4000 CPU: 1
GPR00: 0000000000000001 c0000000069d6e90 c0000000005d5720 c0000000053cca80
GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000002
GPR08: 0000000000000000 0000000000000000 c00000000010eb60 c000000006a8e6c0
GPR12: 0000000042004824 c000000007ffe280 0000000000000004 00000000f7891688
GPR16: 00000000f7871734 00000000f78927a4 00000000f7893008 00000000f78927a8
GPR20: 00000000ffa29300 000000000fd958a0 0000000000000001 000000000000000d
GPR24: 000000000fd98240 c0000000061afb10 0000000000000008 c00000000e43b010
GPR28: c0000000053cca20 c0000000053cca80 c000000000576b60 c0000000053cca20
NIP [c00000000010eb70] .dput+0x48/0x214
LR [c00000000010eb60] .dput+0x38/0x214
Call Trace:
[c0000000069d6e90] [c00000000010eb60] .dput+0x38/0x214 (unreliable)
[c0000000069d6f30] [c0000000000f6214] .fput+0x24c/0x288
[c0000000069d6fe0] [c0000000000ca578] .remove_vma+0x68/0xcc
[c0000000069d7070] [c0000000000ca6ec] .exit_mmap+0x110/0x14c
[c0000000069d7190] [c00000000003d844] .mmput+0x74/0x154
[c0000000069d7220] [d0000000005ec390] .spu_forget+0x54/0x7c [spufs]
[c0000000069d72b0] [d0000000005e6294] .spufs_dir_close+0x8c/0xc8 [spufs]
[c0000000069d7360] [c0000000000f6140] .fput+0x178/0x288
[c0000000069d7410] [c0000000000f21dc] .filp_close+0xbc/0xe4
[c0000000069d74a0] [c000000000042824] .put_files_struct+0xf4/0x1b8
[c0000000069d7550] [c000000000044654] .do_exit+0x244/0x738
[c0000000069d7660] [c0000000000194d0] .die+0x34c/0x374
[c0000000069d7700] [c0000000000198a0] ._exception+0x88/0x174
[c0000000069d78c0] [c000000000005314] program_check_common+0x114/0x180
--- Exception: 700 at .dput+0x48/0x214
LR = .dput+0x38/0x214
[c0000000069d7c50] [c0000000000f6214] .fput+0x24c/0x288
[c0000000069d7d00] [c0000000000f21dc] .filp_close+0xbc/0xe4
[c0000000069d7d90] [c0000000000f22d4] .SyS_close+0xd0/0x128
[c0000000069d7e30] [c0000000000097d8] syscall_exit+0x0/0x40
Instruction dump:
fb61ffd8 fb81ffe0 fba1ffe8 f821ff61 418201c8 3bbf0060 7fa3eb78 4827d375
60000000 813f0058 7d200074 7800d182 <0b000000> 2b890001 409d0010 3809ffff
---[ end trace 6735bcf43efb1300 ]---
Fixing recursive fault but reboot is needed!
|
|
|
|
|
|
| Re: again SPU problems with Linux 3 [message #1234 is a reply to message #1168] |
Fri, 04 May 2012 15:53   |
masterzorag
Messages: 99 Registered: August 2011 Location: maiworld
|
Gitbrew Member |
|
|
devs committed a patch for the issue, tested on 3.3.3:
first issue gone, no exceptions in do_spu_create, but it falls again in possible circular locking dependency detected... at the end, same effect of your patch.
kernel 3.4-rc1 boots (patched fine with your 3.3_patchset), but same issue
very strange that issue happens only on first run:
after first run the issue does not appears, but the program slow down as issue ever exist!
pinged devs as 'possible circular locking dependency', waiting for a fix...
update: same results with 3.4.0
[Updated on: Fri, 25 May 2012 17:09] Report message to a moderator
|
|
|
|
| Re: again SPU problems with Linux 3 [message #1424 is a reply to message #1234] |
Wed, 13 June 2012 11:17   |
masterzorag
Messages: 99 Registered: August 2011 Location: maiworld
|
Gitbrew Member |
|
|
testing glevand's prebuilt kernels here, spufs issue is fixed until 3.2.20.
on 3.3 and 3.4 we fault in this issue, I've noticed some debug info has changed, so I post new debug
======================================================
[ INFO: possible circular locking dependency detected ]
3.3.8 #1 Not tainted
-------------------------------------------------------
perlin/997 is trying to acquire lock:
(&mm->mmap_sem){++++++}, at: [<d000000002c23004>] .spufs_ps_fault.isra.17+0x1e4/0x220 [spufs]
but task is already holding lock:
(&ctx->state_mutex){+.+...}, at: [<d000000002c22fd8>] .spufs_ps_fault.isra.17+0x1b8/0x220 [spufs]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&ctx->state_mutex){+.+...}:
[<c00000000045e964>] .mutex_lock_interruptible_nested+0x74/0x508
[<d000000002c224f8>] .spufs_mem_mmap_fault+0xd8/0x190 [spufs]
[<c0000000000d8138>] .__do_fault+0xc0/0x6c8
[<c000000000027714>] .do_page_fault+0x214/0x640
[<c0000000000059e0>] handle_page_fault+0x20/0x74
-> #0 (&mm->mmap_sem){++++++}:
[<c000000000093934>] .lock_acquire+0x9c/0xd8
[<c000000000460194>] .down_read+0x34/0x7c
[<d000000002c23004>] .spufs_ps_fault.isra.17+0x1e4/0x220 [spufs]
[<c0000000000d8138>] .__do_fault+0xc0/0x6c8
[<c000000000027714>] .do_page_fault+0x214/0x640
[<c0000000000059e0>] handle_page_fault+0x20/0x74
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&ctx->state_mutex);
lock(&mm->mmap_sem);
lock(&ctx->state_mutex);
lock(&mm->mmap_sem);
*** DEADLOCK ***
1 lock held by perlin/997:
#0: (&ctx->state_mutex){+.+...}, at: [<d000000002c22fd8>] .spufs_ps_fault.isra.17+0x1b8/0x220 [spufs]
stack backtrace:
Call Trace:
[c000000006a176d0] [c000000000012a20] .show_stack+0x70/0x1a8 (unreliable)
[c000000006a17780] [c00000000046418c] .print_circular_bug+0x308/0x340
[c000000006a17830] [c000000000093328] .__lock_acquire+0x1ca0/0x1cd0
[c000000006a179b0] [c000000000093934] .lock_acquire+0x9c/0xd8
[c000000006a17a80] [c000000000460194] .down_read+0x34/0x7c
[c000000006a17b00] [d000000002c23004] .spufs_ps_fault.isra.17+0x1e4/0x220 [spufs]
[c000000006a17be0] [c0000000000d8138] .__do_fault+0xc0/0x6c8
[c000000006a17cf0] [c000000000027714] .do_page_fault+0x214/0x640
[c000000006a17e30] [c0000000000059e0] handle_page_fault+0x20/0x74
and here is the issue's effect: slow down runtime
uname -a
Linux fedora_clone 3.3.8 #1 SMP Tue Jun 5 18:18:00 CEST 2012 ppc64 ppc64 ppc64 GNU/Linux
./perlin
OpenCL took 91.365402 seconds to compute 1000 frames. Pixel Rate = 11.476729 Mpixels/sec, Frame Rate = 10.945062 frames/sec
Host code took 12.527953 seconds to compute 10 frames. Pixel Rate = 0.836989 Mpixels/sec, Frame Rate = 0.798215 frames/sec
OpenCL provided a 13.711922 speedup
instead of
uname -a
Linux fedora_clone 3.2.19 #1 SMP Thu Jun 7 08:59:29 CEST 2012 ppc64 ppc64 ppc64 GNU/Linux
./perlin
OpenCL took 22.466829 seconds to compute 1000 frames. Pixel Rate = 46.672184 Mpixels/sec, Frame Rate = 44.510064 frames/sec
Host code took 12.905138 seconds to compute 10 frames. Pixel Rate = 0.812526 Mpixels/sec, Frame Rate = 0.774885 frames/sec
OpenCL provided a 57.440851 speedup
in this case we spend more than 4x time!
(cell can vectorize by 4 and this issue, in this case, nullify the potential gain; it's not so mathematical, gain it's absolutly relative; it's only a case, but let you see the effect)
|
|
|
|
|
|
|
|
|
|
|
|
|
|