Official Gitbrew Forums
These forums cover all aspects of the Gitbrew biosphere. If you would like your development project to be based here, ask an admin.

Home » PS3 » OtherOS++ » Linux » SPU problems with Linux 3
SPU problems with Linux 3 [message #689] Wed, 29 February 2012 20:27 Go to next message
glevand is currently offline  glevand
Messages: 955
Registered: July 2011
Location: SONY
Gitbrew God
It seems that we have problems with SPUs on Linux 3.
Re: SPU problems with Linux 3 [message #691 is a reply to message #689] Wed, 29 February 2012 20:28 Go to previous messageGo to next message
glevand is currently offline  glevand
Messages: 955
Registered: July 2011
Location: SONY
Gitbrew God
Having a similar problem on Linux 3.2.8

=====================================
[ BUG: bad unlock balance detected! ]
-------------------------------------
ppu_hello/1924 is trying to release lock (&sb->s_type->i_mutex_key) at:
[<d000000003773fd4>] .do_spu_create+0x84/0xd0 [spufs]
but there are no more locks to release!

other info that might help us debug this:
no locks held by ppu_hello/1924.

stack backtrace:
Call Trace:
[c000000006f7fa30] [c0000000000129a0] .show_stack+0x70/0x1a8 (unreliable)
[c000000006f7fae0] [c00000000008d36c] .print_unlock_inbalance_bug+0x114/0x120
[c000000006f7fb70] [c000000000092838] .lock_release+0x1d8/0x210
[c000000006f7fc20] [c0000000004568fc] .__mutex_unlock_slowpath+0xbc/0x228
[c000000006f7fcc0] [d000000003773fd4] .do_spu_create+0x84/0xd0 [spufs]
[c000000006f7fd70] [c000000000037754] .sys_spu_create+0xfc/0x1f0
[c000000006f7fe30] [c0000000000097d8] syscall_exit+0x0/0x40
------------[ cut here ]------------
kernel BUG at fs/dcache.c:474!
Oops: Exception in kernel mode, sig: 5 [#1]
SMP NR_CPUS=2 PS3
Modules linked in: spufs ps3_jupiter_sta btusb bluetooth ps3_jupiter snd_ps3 snd                         _pcm snd_timer ohci_hcd snd ehci_hcd sg usbcore ps3nflash ps3_disp_manager ps3_l                         pm usb_common soundcore ps3rom ps3flash rtc_ps3 snd_page_alloc
NIP: c00000000011b53c LR: c00000000011b52c CTR: c0000000000ae730
REGS: c000000006f7f930 TRAP: 0700   Not tainted  (3.2.8)
MSR: 8000000000028032 <EE,CE,IR,DR>  CR: 22080422  XER: 20000000
TASK = c000000006ebab40[1924] 'ppu_hello' THREAD: c000000006f7c000 CPU: 1
GPR00: c00000000011b52c c000000006f7fbb0 c0000000006dcac8 c000000005fe5b78
GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000002
GPR08: 0000000000000000 0000000000000001 c00000000011b52c 0000000000000000
GPR12: 0000000042080424 c000000007ffe280 0000000000000000 00000000100f0000
GPR16: 00000000100f5678 00000000100dc70c 00000000100f5820 00000000100f5790
GPR20: 00000000100f5598 0000000000000000 0000000000000000 00000000100f0000
GPR24: 0000000000000000 c000000006a3c690 0000000040000010 c0000000060d4080
GPR28: c000000005fe5b18 c000000005fe5b78 c00000000066f400 c000000005fe5b18
NIP [c00000000011b53c] .dput+0x54/0x288
LR [c00000000011b52c] .dput+0x44/0x288
Call Trace:
[c000000006f7fbb0] [c00000000011b52c] .dput+0x44/0x288 (unreliable)
[c000000006f7fc50] [c000000000101d90] .fput+0x1d8/0x2f0
[c000000006f7fd00] [c0000000000fd080] .filp_close+0xa8/0xe8
[c000000006f7fd90] [c0000000000fea84] .SyS_close+0xcc/0x130
[c000000006f7fe30] [c0000000000097d8] syscall_exit+0x0/0x40
Instruction dump:
fb81ffe0 fba1ffe8 91810008 f821ff61 418e0080 3bbf0060 7fa3eb78 4833c581
60000000 817f0058 7d690074 7929d182 <0b090000> 2b8b0001 419d013c 801f0000
---[ end trace 4684d6840129b57f ]---
glevand@debian:~/spu_hello$


Compiled and installed libspe2 from SVN: https://libspe.svn.sourceforge.net/svnroot/libspe

Tested SPU program from here: http://cellperformance.beyond3d.com/articles/2006/11/cross-c ompiling-for-ps3-linux-1.html


The program worked fine but upon closing SPU, the exception was raised.

I wonder if kernel devs are aware of it.

[Updated on: Wed, 29 February 2012 20:32]

Report message to a moderator

Re: SPU problems with Linux 3 [message #693 is a reply to message #691] Wed, 29 February 2012 21:22 Go to previous messageGo to next message
glevand is currently offline  glevand
Messages: 955
Registered: July 2011
Location: SONY
Gitbrew God
There are actually 2 exceptions.
One during creating SPU and one when closing SPU.
Re: SPU problems with Linux 3 [message #694 is a reply to message #693] Wed, 29 February 2012 21:24 Go to previous messageGo to next message
glevand is currently offline  glevand
Messages: 955
Registered: July 2011
Location: SONY
Gitbrew God
do_spu_create was changed between 3.0.22 and 3.1.

This is this the commit:
http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=commitdiff;h=1ba106818615faddb63ba782f85f3498b9eb61c6

[Updated on: Wed, 29 February 2012 21:33]

Report message to a moderator

Re: SPU problems with Linux 3 [message #695 is a reply to message #693] Thu, 01 March 2012 13:00 Go to previous messageGo to next message
masterzorag is currently offline  masterzorag
Messages: 99
Registered: August 2011
Location: maiworld
Gitbrew Member
glevand wrote on Wed, 29 February 2012 21:22
There are actually 2 exceptions.
One during creating SPU and one when closing SPU.

Yes, two of them, same issue;
Even on my OpenCL program, it computes correctly, but slower.
Re: SPU problems with Linux 3 [message #700 is a reply to message #695] Thu, 01 March 2012 18:20 Go to previous messageGo to next message
glevand is currently offline  glevand
Messages: 955
Registered: July 2011
Location: SONY
Gitbrew God
Did you test Linux kernel 3.0.22 ? Maybe we should do it first and see if it works.
Re: SPU problems with Linux 3 [message #701 is a reply to message #700] Thu, 01 March 2012 18:48 Go to previous messageGo to next message
glevand is currently offline  glevand
Messages: 955
Registered: July 2011
Location: SONY
Gitbrew God
Tested Linux Kernel 3.0.22.
SPUs work fine, no kernel exceptions.
Re: SPU problems with Linux 3 [message #703 is a reply to message #701] Thu, 01 March 2012 18:55 Go to previous messageGo to next message
glevand is currently offline  glevand
Messages: 955
Registered: July 2011
Location: SONY
Gitbrew God
I think we should report this bug before 3.3 release.
I will try to fix it myself too.
Re: SPU problems with Linux 3 [message #723 is a reply to message #703] Fri, 02 March 2012 22:15 Go to previous messageGo to next message
glevand is currently offline  glevand
Messages: 955
Registered: July 2011
Location: SONY
Gitbrew God
Yeah, definetly a bug.
First time, the mutex gets unlocked in spufs_create_context.
And then the second time in do_spu_create.
Really funny, That means, the person who commited it, never tested it actually Smile
Re: SPU problems with Linux 3 [message #724 is a reply to message #723] Fri, 02 March 2012 22:34 Go to previous messageGo to next message
glevand is currently offline  glevand
Messages: 955
Registered: July 2011
Location: SONY
Gitbrew God
With this patch, one exception less Smile

--- arch/powerpc/platforms/cell/spufs/syscalls.c        2012-03-01 01:32:49.000000000 +0100
+++ arch/powerpc/platforms/cell/spufs/syscalls.c.new    2012-03-02 21:24:46.616534770 +0100
@@ -70,7 +70,6 @@
        ret = PTR_ERR(dentry);
        if (!IS_ERR(dentry)) {
                ret = spufs_create(&path, dentry, flags, mode, neighbor);
-               mutex_unlock(&path.dentry->d_inode->i_mutex);
                dput(dentry);
                path_put(&path);
        }

[Updated on: Fri, 02 March 2012 22:42]

Report message to a moderator

Re: SPU problems with Linux 3 [message #726 is a reply to message #724] Sat, 03 March 2012 11:23 Go to previous messageGo to next message
glevand is currently offline  glevand
Messages: 955
Registered: July 2011
Location: SONY
Gitbrew God
Hmm, regarding the second problem.
It seems that SPU main directory dentry has invalid d_count.

dput:475: spethread-1394-270852128
------------[ cut here ]------------
kernel BUG at fs/dcache.c:476!
Re: SPU problems with Linux 3 [message #727 is a reply to message #726] Sat, 03 March 2012 11:41 Go to previous messageGo to next message
glevand is currently offline  glevand
Messages: 955
Registered: July 2011
Location: SONY
Gitbrew God
Ok, fixed both problems.
Here is the patch, works fine now.

http://gitbrew.org/~glevand/ps3/linux/linux-3/patches/syscal l-spu-create-unlock-dput-fix.patch
Re: SPU problems with Linux 3 [message #728 is a reply to message #727] Sat, 03 March 2012 11:48 Go to previous messageGo to next message
glevand is currently offline  glevand
Messages: 955
Registered: July 2011
Location: SONY
Gitbrew God
We need a volunteer to submit this patch to linux kernel before 3.3. release.
I cannot do it for obvious reasons Smile I doubt they will accept anything from me, unlike freebsd. Too many SONY and IBM pussies are working on Linux powerpc.

[Updated on: Sat, 03 March 2012 12:03]

Report message to a moderator

Re: SPU problems with Linux 3 [message #729 is a reply to message #728] Sat, 03 March 2012 12:02 Go to previous messageGo to next message
glevand is currently offline  glevand
Messages: 955
Registered: July 2011
Location: SONY
Gitbrew God
Uploaded my latest Linux 3.2.9 kernel with SPU bug fixed: http://gitbrew.org/~glevand/ps3/linux/linux-3/linux-3.2.9-bu ild.tar.bz2
In case someone wants to test it.
Re: SPU problems with Linux 3 [message #730 is a reply to message #729] Sat, 03 March 2012 14:24 Go to previous messageGo to next message
glevand is currently offline  glevand
Messages: 955
Registered: July 2011
Location: SONY
Gitbrew God
Intersting info about SPUFS implementation: http://diploma-thesis.siewior.net/html/diplomarbeitch5.html
Re: SPU problems with Linux 3 [message #745 is a reply to message #727] Mon, 05 March 2012 10:44 Go to previous messageGo to next message
masterzorag is currently offline  masterzorag
Messages: 99
Registered: August 2011
Location: maiworld
Gitbrew Member
patch tested, on 3.2.5 issues gone!
Re: SPU problems with Linux 3 [message #752 is a reply to message #745] Mon, 05 March 2012 18:12 Go to previous messageGo to next message
glevand is currently offline  glevand
Messages: 955
Registered: July 2011
Location: SONY
Gitbrew God
Cool, want to submit it to Linux powerpc mailing list ?
Re: SPU problems with Linux 3 [message #800 is a reply to message #752] Wed, 07 March 2012 11:42 Go to previous messageGo to next message
masterzorag is currently offline  masterzorag
Messages: 99
Registered: August 2011
Location: maiworld
Gitbrew Member
glevand wrote on Mon, 05 March 2012 18:12
Cool, want to submit it to Linux powerpc mailing list ?

http://www.mail-archive.com/linuxppc-dev@lists.ozlabs.org/ms g57323.html

[Updated on: Wed, 07 March 2012 11:46]

Report message to a moderator

Re: SPU problems with Linux 3 [message #801 is a reply to message #800] Wed, 07 March 2012 16:35 Go to previous messageGo to next message
glevand is currently offline  glevand
Messages: 955
Registered: July 2011
Location: SONY
Gitbrew God
Thanks
icon4.gif  again SPU problems with Linux 3 [message #1159 is a reply to message #801] Thu, 26 April 2012 12:48 Go to previous messageGo to next message
masterzorag is currently offline  masterzorag
Messages: 99
Registered: August 2011
Location: maiworld
Gitbrew Member
resolved the need of initramfs, falling again into SPU problems:
as ever, I'm running my OpenCL test program, the code runs fine, but slower than possible, no errors on 3.2.5!
======================================================
[ INFO: possible circular locking dependency detected ]
3.3.3 #1 Not tainted
-------------------------------------------------------
test/980 is trying to acquire lock:
 (&mm->mmap_sem){++++++}, at: [<d0000000005ea408>] .spufs_ps_fault+0x160/0x210 [spufs]

but task is already holding lock:
 (&ctx->state_mutex){+.+...}, at: [<d0000000005ea3dc>] .spufs_ps_fault+0x134/0x210 [spufs]

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (&ctx->state_mutex){+.+...}:
       [<c0000000003890e4>] .mutex_lock_interruptible_nested+0x70/0x4a0
       [<d0000000005ebd04>] .spufs_mem_mmap_fault+0xb0/0x160 [spufs]
       [<c0000000000c3068>] .__do_fault+0x108/0x5e8
       [<c000000000026470>] .do_page_fault+0x374/0x510
       [<c0000000000059e0>] handle_page_fault+0x20/0x74

-> #0 (&mm->mmap_sem){++++++}:
       [<c000000000086b0c>] .lock_acquire+0x98/0xcc
       [<c00000000038a4f0>] .down_read+0x34/0x78
       [<d0000000005ea408>] .spufs_ps_fault+0x160/0x210 [spufs]
       [<c0000000000c3068>] .__do_fault+0x108/0x5e8
       [<c000000000026470>] .do_page_fault+0x374/0x510
       [<c0000000000059e0>] handle_page_fault+0x20/0x74

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&ctx->state_mutex);
                               lock(&mm->mmap_sem);
                               lock(&ctx->state_mutex);
  lock(&mm->mmap_sem);

 *** DEADLOCK ***

1 lock held by test/980:
 #0:  (&ctx->state_mutex){+.+...}, at: [<d0000000005ea3dc>] .spufs_ps_fault+0x134/0x210 [spufs]

stack backtrace:
Call Trace:
[c00000000e5af720] [c000000000011120] .show_stack+0x6c/0x16c (unreliable)
[c00000000e5af7d0] [c0000000000847b4] .print_circular_bug+0x2f8/0x330
[c00000000e5af880] [c000000000086334] .__lock_acquire+0x1318/0x1a58
[c00000000e5af9d0] [c000000000086b0c] .lock_acquire+0x98/0xcc
[c00000000e5afaa0] [c00000000038a4f0] .down_read+0x34/0x78
[c00000000e5afb20] [d0000000005ea408] .spufs_ps_fault+0x160/0x210 [spufs]
[c00000000e5afc00] [c0000000000c3068] .__do_fault+0x108/0x5e8
[c00000000e5afd00] [c000000000026470] .do_page_fault+0x374/0x510
[c00000000e5afe30] [c0000000000059e0] handle_page_fault+0x20/0x74

should we ping again ppc kernel devs?
Re: again SPU problems with Linux 3 [message #1162 is a reply to message #1159] Thu, 26 April 2012 18:30 Go to previous messageGo to next message
glevand is currently offline  glevand
Messages: 955
Registered: July 2011
Location: SONY
Gitbrew God
Great, another bug with SPU.
Did you apply my SPU patch ? Maybe we should try without it, maybe that's because of this patch. Just an idea.

[Updated on: Thu, 26 April 2012 18:37]

Report message to a moderator

Re: again SPU problems with Linux 3 [message #1166 is a reply to message #1162] Fri, 27 April 2012 10:15 Go to previous messageGo to next message
masterzorag is currently offline  masterzorag
Messages: 99
Registered: August 2011
Location: maiworld
Gitbrew Member
glevand wrote on Thu, 26 April 2012 18:30
Great, another bug with SPU.
Did you apply my SPU patch ? Maybe we should try without it, maybe that's because of this patch. Just an idea.

without your patch, one step less...
=====================================
[ BUG: bad unlock balance detected! ]
3.3.3 #1 Not tainted
-------------------------------------
test/900 is trying to release lock (&sb->s_type->i_mutex_key) at:
[<d0000000005ec900>] .do_spu_create+0x90/0xd8 [spufs]
but there are no more locks to release!

other info that might help us debug this:
no locks held by test/900.

stack backtrace:
Call Trace:
[c0000000069d7a30] [c000000000011120] .show_stack+0x6c/0x16c (unreliable)
[c0000000069d7ae0] [c000000000084928] .print_unlock_inbalance_bug+0xec/0x114
[c0000000069d7b70] [c000000000086edc] .lock_release+0xd8/0x200
[c0000000069d7c10] [c000000000388dac] .__mutex_unlock_slowpath+0x11c/0x1d8
[c0000000069d7cb0] [d0000000005ec900] .do_spu_create+0x90/0xd8 [spufs]
[c0000000069d7d70] [c0000000000352b8] .SyS_spu_create+0x16c/0x1c8
[c0000000069d7e30] [c0000000000097d8] syscall_exit+0x0/0x40
------------[ cut here ]------------
Kernel BUG at c00000000010eb70 [verbose debug info unavailable]
Oops: Exception in kernel mode, sig: 5 [#1]
SMP NR_CPUS=2 NUMA PS3
Modules linked in: spufs dm_mod btusb bluetooth usb_storage snd_ps3 snd_pcm snd_timer snd ohci_hcd ps3_disp_manager soundcore ehci_hcd rtc_ps3 ps3vram ps3_lpm snd_page_alloc ps3flash usbcore usb_common [last unl
oaded: scsi_wait_scan]
NIP: c00000000010eb70 LR: c00000000010eb60 CTR: c0000000000a08cc
REGS: c0000000069d7930 TRAP: 0700   Not tainted  (3.3.3)
MSR: 8000000000028032 <SF,EE,IR,DR,RI>  CR: 22004822  XER: 20000000
TASK = c000000006a8e6c0[900] 'test' THREAD: c0000000069d4000 CPU: 1
GPR00: 0000000000000001 c0000000069d7bb0 c0000000005d5720 c0000000053c00d0 
GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000002 
GPR08: 0000000000000000 0000000000000000 c00000000010eb60 c000000006a8e6c0 
GPR12: 0000000082004824 c000000007ffe280 0000000000000004 00000000f7891688 
GPR16: 00000000f7871734 00000000f78927a4 00000000f7893008 00000000f78927a8 
GPR20: 00000000ffa29300 000000000fd958a0 0000000000000000 000000000000000d 
GPR24: 000000000fd98240 c00000000687a990 0000000040000010 c0000000060d34a0 
GPR28: c0000000053c0070 c0000000053c00d0 c000000000576b60 c0000000053c0070 
NIP [c00000000010eb70] .dput+0x48/0x214
LR [c00000000010eb60] .dput+0x38/0x214
Call Trace:
[c0000000069d7bb0] [c00000000010eb60] .dput+0x38/0x214 (unreliable)
[c0000000069d7c50] [c0000000000f6214] .fput+0x24c/0x288
[c0000000069d7d00] [c0000000000f21dc] .filp_close+0xbc/0xe4
[c0000000069d7d90] [c0000000000f22d4] .SyS_close+0xd0/0x128
[c0000000069d7e30] [c0000000000097d8] syscall_exit+0x0/0x40
Instruction dump:
fb61ffd8 fb81ffe0 fba1ffe8 f821ff61 418201c8 3bbf0060 7fa3eb78 4827d375 
60000000 813f0058 7d200074 7800d182 <0b000000> 2b890001 409d0010 3809ffff 
---[ end trace 6735bcf43efb12ff ]---

------------[ cut here ]------------
Kernel BUG at c00000000010eb70 [verbose debug info unavailable]
Oops: Exception in kernel mode, sig: 5 [#2]
SMP NR_CPUS=2 NUMA PS3
Modules linked in: spufs dm_mod btusb bluetooth usb_storage snd_ps3 snd_pcm snd_timer snd ohci_hcd ps3_disp_manager soundcore ehci_hcd rtc_ps3 ps3vram ps3_lpm snd_page_alloc ps3flash usbcore usb_common [last unloaded: scsi_wait_scan]
NIP: c00000000010eb70 LR: c00000000010eb60 CTR: c0000000000a08cc
REGS: c0000000069d6c10 TRAP: 0700   Tainted: G      D       (3.3.3)
MSR: 8000000000028032 <SF,EE,IR,DR,RI>  CR: 22004822  XER: 20000000
TASK = c000000006a8e6c0[900] 'test' THREAD: c0000000069d4000 CPU: 1
GPR00: 0000000000000001 c0000000069d6e90 c0000000005d5720 c0000000053cca80 
GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000002 
GPR08: 0000000000000000 0000000000000000 c00000000010eb60 c000000006a8e6c0 
GPR12: 0000000042004824 c000000007ffe280 0000000000000004 00000000f7891688 
GPR16: 00000000f7871734 00000000f78927a4 00000000f7893008 00000000f78927a8 
GPR20: 00000000ffa29300 000000000fd958a0 0000000000000001 000000000000000d 
GPR24: 000000000fd98240 c0000000061afb10 0000000000000008 c00000000e43b010 
GPR28: c0000000053cca20 c0000000053cca80 c000000000576b60 c0000000053cca20 
NIP [c00000000010eb70] .dput+0x48/0x214
LR [c00000000010eb60] .dput+0x38/0x214
Call Trace:
[c0000000069d6e90] [c00000000010eb60] .dput+0x38/0x214 (unreliable)
[c0000000069d6f30] [c0000000000f6214] .fput+0x24c/0x288
[c0000000069d6fe0] [c0000000000ca578] .remove_vma+0x68/0xcc
[c0000000069d7070] [c0000000000ca6ec] .exit_mmap+0x110/0x14c
[c0000000069d7190] [c00000000003d844] .mmput+0x74/0x154
[c0000000069d7220] [d0000000005ec390] .spu_forget+0x54/0x7c [spufs]
[c0000000069d72b0] [d0000000005e6294] .spufs_dir_close+0x8c/0xc8 [spufs]
[c0000000069d7360] [c0000000000f6140] .fput+0x178/0x288
[c0000000069d7410] [c0000000000f21dc] .filp_close+0xbc/0xe4
[c0000000069d74a0] [c000000000042824] .put_files_struct+0xf4/0x1b8
[c0000000069d7550] [c000000000044654] .do_exit+0x244/0x738
[c0000000069d7660] [c0000000000194d0] .die+0x34c/0x374
[c0000000069d7700] [c0000000000198a0] ._exception+0x88/0x174
[c0000000069d78c0] [c000000000005314] program_check_common+0x114/0x180
--- Exception: 700 at .dput+0x48/0x214
    LR = .dput+0x38/0x214
[c0000000069d7c50] [c0000000000f6214] .fput+0x24c/0x288
[c0000000069d7d00] [c0000000000f21dc] .filp_close+0xbc/0xe4
[c0000000069d7d90] [c0000000000f22d4] .SyS_close+0xd0/0x128
[c0000000069d7e30] [c0000000000097d8] syscall_exit+0x0/0x40
Instruction dump:
fb61ffd8 fb81ffe0 fba1ffe8 f821ff61 418201c8 3bbf0060 7fa3eb78 4827d375 
60000000 813f0058 7d200074 7800d182 <0b000000> 2b890001 409d0010 3809ffff 
---[ end trace 6735bcf43efb1300 ]---

Fixing recursive fault but reboot is needed!
Re: again SPU problems with Linux 3 [message #1168 is a reply to message #1166] Fri, 27 April 2012 18:43 Go to previous messageGo to next message
glevand is currently offline  glevand
Messages: 955
Registered: July 2011
Location: SONY
Gitbrew God
They didn't fix the old bug but introduced a new one too. Well done.
You should report it to devs i think because i cannot test it withtout installing OpenCL which i don not have time for.
Re: again SPU problems with Linux 3 [message #1234 is a reply to message #1168] Fri, 04 May 2012 15:53 Go to previous messageGo to next message
masterzorag is currently offline  masterzorag
Messages: 99
Registered: August 2011
Location: maiworld
Gitbrew Member
devs committed a patch for the issue, tested on 3.3.3:
first issue gone, no exceptions in do_spu_create, but it falls again in possible circular locking dependency detected... Mad at the end, same effect of your patch.

kernel 3.4-rc1 boots (patched fine with your 3.3_patchset), but same issue

very strange that issue happens only on first run:
after first run the issue does not appears, but the program slow down as issue ever exist!

pinged devs as 'possible circular locking dependency', waiting for a fix...

update: same results with 3.4.0

[Updated on: Fri, 25 May 2012 17:09]

Report message to a moderator

Re: again SPU problems with Linux 3 [message #1424 is a reply to message #1234] Wed, 13 June 2012 11:17 Go to previous messageGo to next message
masterzorag is currently offline  masterzorag
Messages: 99
Registered: August 2011
Location: maiworld
Gitbrew Member
testing glevand's prebuilt kernels here, spufs issue is fixed until 3.2.20.
on 3.3 and 3.4 we fault in this issue, I've noticed some debug info has changed, so I post new debug
======================================================
[ INFO: possible circular locking dependency detected ]
3.3.8 #1 Not tainted
-------------------------------------------------------
perlin/997 is trying to acquire lock:
 (&mm->mmap_sem){++++++}, at: [<d000000002c23004>] .spufs_ps_fault.isra.17+0x1e4/0x220 [spufs]

but task is already holding lock:
 (&ctx->state_mutex){+.+...}, at: [<d000000002c22fd8>] .spufs_ps_fault.isra.17+0x1b8/0x220 [spufs]

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (&ctx->state_mutex){+.+...}:
       [<c00000000045e964>] .mutex_lock_interruptible_nested+0x74/0x508
       [<d000000002c224f8>] .spufs_mem_mmap_fault+0xd8/0x190 [spufs]
       [<c0000000000d8138>] .__do_fault+0xc0/0x6c8
       [<c000000000027714>] .do_page_fault+0x214/0x640
       [<c0000000000059e0>] handle_page_fault+0x20/0x74

-> #0 (&mm->mmap_sem){++++++}:
       [<c000000000093934>] .lock_acquire+0x9c/0xd8
       [<c000000000460194>] .down_read+0x34/0x7c
       [<d000000002c23004>] .spufs_ps_fault.isra.17+0x1e4/0x220 [spufs]
       [<c0000000000d8138>] .__do_fault+0xc0/0x6c8
       [<c000000000027714>] .do_page_fault+0x214/0x640
       [<c0000000000059e0>] handle_page_fault+0x20/0x74

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&ctx->state_mutex);
                               lock(&mm->mmap_sem);
                               lock(&ctx->state_mutex);
  lock(&mm->mmap_sem);

 *** DEADLOCK ***

1 lock held by perlin/997:
 #0:  (&ctx->state_mutex){+.+...}, at: [<d000000002c22fd8>] .spufs_ps_fault.isra.17+0x1b8/0x220 [spufs]

stack backtrace:
Call Trace:
[c000000006a176d0] [c000000000012a20] .show_stack+0x70/0x1a8 (unreliable)
[c000000006a17780] [c00000000046418c] .print_circular_bug+0x308/0x340
[c000000006a17830] [c000000000093328] .__lock_acquire+0x1ca0/0x1cd0
[c000000006a179b0] [c000000000093934] .lock_acquire+0x9c/0xd8
[c000000006a17a80] [c000000000460194] .down_read+0x34/0x7c
[c000000006a17b00] [d000000002c23004] .spufs_ps_fault.isra.17+0x1e4/0x220 [spufs]
[c000000006a17be0] [c0000000000d8138] .__do_fault+0xc0/0x6c8
[c000000006a17cf0] [c000000000027714] .do_page_fault+0x214/0x640
[c000000006a17e30] [c0000000000059e0] handle_page_fault+0x20/0x74

and here is the issue's effect: slow down runtime
uname -a
Linux fedora_clone 3.3.8 #1 SMP Tue Jun 5 18:18:00 CEST 2012 ppc64 ppc64 ppc64 GNU/Linux

./perlin 
OpenCL took 91.365402 seconds to compute 1000 frames. Pixel Rate = 11.476729 Mpixels/sec, Frame Rate = 10.945062 frames/sec
Host code took 12.527953 seconds to compute 10 frames. Pixel Rate = 0.836989 Mpixels/sec, Frame Rate = 0.798215 frames/sec
OpenCL provided a 13.711922 speedup

instead of
uname -a
Linux fedora_clone 3.2.19 #1 SMP Thu Jun 7 08:59:29 CEST 2012 ppc64 ppc64 ppc64 GNU/Linux

./perlin 
OpenCL took 22.466829 seconds to compute 1000 frames. Pixel Rate = 46.672184 Mpixels/sec, Frame Rate = 44.510064 frames/sec
Host code took 12.905138 seconds to compute 10 frames. Pixel Rate = 0.812526 Mpixels/sec, Frame Rate = 0.774885 frames/sec
OpenCL provided a 57.440851 speedup

in this case we spend more than 4x time!
(cell can vectorize by 4 and this issue, in this case, nullify the potential gain; it's not so mathematical, gain it's absolutly relative; it's only a case, but let you see the effect)
Re: again SPU problems with Linux 3 [message #1429 is a reply to message #1424] Wed, 13 June 2012 17:59 Go to previous messageGo to next message
glevand is currently offline  glevand
Messages: 955
Registered: July 2011
Location: SONY
Gitbrew God
If you could give me a spu program which doesn't need opencl and which causes this panic to occur then i could debug it.
i will try to find time to see what is wrong here.
Re: again SPU problems with Linux 3 [message #1432 is a reply to message #1429] Wed, 13 June 2012 19:27 Go to previous messageGo to next message
glevand is currently offline  glevand
Messages: 955
Registered: July 2011
Location: SONY
Gitbrew God
Hmm, i don't see any differencies in code between 3..2 and 3.4.

http://lxr.free-electrons.com/source/arch/powerpc/platforms/ cell/spufs/file.c?v=3.4;a=powerpc
http://lxr.free-electrons.com/source/arch/powerpc/platforms/ cell/spufs/file.c?v=3.2;a=powerpc

[Updated on: Wed, 13 June 2012 19:27]

Report message to a moderator

Re: again SPU problems with Linux 3 [message #1433 is a reply to message #1432] Wed, 13 June 2012 19:31 Go to previous messageGo to next message
glevand is currently offline  glevand
Messages: 955
Registered: July 2011
Location: SONY
Gitbrew God
Try to disable locok debugging in your Linux kernel. see "Kernel hacking" menu.
The exception you get says that is a possible circular locking dependency
which means it's not necessary it is one.

[Updated on: Wed, 13 June 2012 19:31]

Report message to a moderator

Re: again SPU problems with Linux 3 [message #1459 is a reply to message #1433] Mon, 18 June 2012 13:29 Go to previous messageGo to next message
masterzorag is currently offline  masterzorag
Messages: 99
Registered: August 2011
Location: maiworld
Gitbrew Member
now on 3.4.3, disabled lock debugging: no issue pointed out by kernel
OpenCL code runs correctly, BUT
uname -a
Linux fedora_clone 3.4.3 #1 SMP Mon Jun 18 12:43:05 CEST 2012 ppc64 ppc64 ppc64 GNU/Linux
./perlin 
OpenCL took 91.444168 seconds to compute 1000 frames. Pixel Rate = 11.466844 Mpixels/sec, Frame Rate = 10.935635 frames/sec
Host code took 12.512553 seconds to compute 10 frames. Pixel Rate = 0.838019 Mpixels/sec, Frame Rate = 0.799197 frames/sec
OpenCL provided a 13.683271 speedup

the code is running as the issue ever exist, there is slowdown.

then my program:
runs correctly, but slowdown a little bit, (4% slower) but now it tells me there are 8 max compute units! Mad instead of 7!

I think they want to broken also OpenCL possibilities now? wtf are they doing?
I've to test without enumerating the 7th...
to save your time [message #1495 is a reply to message #1429] Thu, 21 June 2012 11:45 Go to previous message
masterzorag is currently offline  masterzorag
Messages: 99
Registered: August 2011
Location: maiworld
Gitbrew Member
glevand wrote on Wed, 13 June 2012 17:59
If you could give me a spu program which doesn't need opencl and which causes this panic to occur then i could debug it.
i will try to find time to see what is wrong here.

sorry but I don't have a spu program to test Confused
to save your time, I can point to my tarred FC12 image that has opencl installed yet, and samples in /home. I've linked it in wiki (about 1.3G untarred)
Previous Topic: WLAN on petitboot
Next Topic: ubuntu help
Goto Forum:
  


Current Time: Thu Jun 20 09:30:06 CEST 2013

Total time taken to generate the page: 0.02787 seconds