8 years agoMerge branch 'slab/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg...
Linus Torvalds [Sat, 5 Sep 2009 20:57:53 +0000 (13:57 -0700)]
Merge branch 'slab/urgent' of git://git./linux/kernel/git/penberg/slab-2.6

* 'slab/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
  slub: Fix kmem_cache_destroy() with SLAB_DESTROY_BY_RCU

8 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm
Linus Torvalds [Sat, 5 Sep 2009 20:51:07 +0000 (13:51 -0700)]
Merge git://git./linux/kernel/git/agk/linux-2.6-dm

* git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm:
  dm snapshot: fix on disk chunk size validation
  dm exception store: split set_chunk_size
  dm snapshot: fix header corruption race on invalidation
  dm snapshot: refactor zero_disk_area to use chunk_io
  dm log: userspace add luid to distinguish between concurrent log instances
  dm raid1: do not allow log_failure variable to unset after being set
  dm log: remove incorrect field from userspace table output
  dm log: fix userspace status output
  dm stripe: expose correct io hints
  dm table: add more context to terse warning messages
  dm table: fix queue_limit checking device iterator
  dm snapshot: implement iterate devices
  dm multipath: fix oops when request based io fails when no paths

8 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes...
Linus Torvalds [Sat, 5 Sep 2009 20:50:46 +0000 (13:50 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jbarnes/pci-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
  PCI SR-IOV: correct broken resource alignment calculations

8 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6
Linus Torvalds [Sat, 5 Sep 2009 20:49:06 +0000 (13:49 -0700)]
Merge git://git./linux/kernel/git/davem/sparc-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
  sparc64: Fix bootup with mcount in some configs.
  sparc64: Kill spurious NMI watchdog triggers by increasing limit to 30 seconds.

8 years agoMerge branch 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Sat, 5 Sep 2009 20:48:37 +0000 (13:48 -0700)]
Merge branch 'perfcounters-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf_counter/powerpc: Fix cache event codes for POWER7
  perf_counter: Fix /0 bug in swcounters
  perf_counters: Increase paranoia level

8 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Linus Torvalds [Sat, 5 Sep 2009 20:41:29 +0000 (13:41 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: atkbd - add Compaq Presario R4000-series repeat quirk
  Input: i8042 - add Acer Aspire 5536 to the nomux list

8 years agoext2: fix unbalanced kmap()/kunmap()
Nicolas Pitre [Sat, 5 Sep 2009 04:25:37 +0000 (00:25 -0400)]
ext2: fix unbalanced kmap()/kunmap()

In ext2_rename(), dir_page is acquired through ext2_dotdot().  It is
then released through ext2_set_link() but only if old_dir != new_dir.
Failing that, the pkmap reference count is never decremented and the
page remains pinned forever.  Repeat that a couple times with highmem
pages and all pkmap slots get exhausted, and every further kmap() calls
end up stalling on the pkmap_map_wait queue at which point the whole
system comes to a halt.

Signed-off-by: Nicolas Pitre <nico@marvell.com>
Acked-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoMerge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec...
Linus Torvalds [Sat, 5 Sep 2009 20:38:37 +0000 (13:38 -0700)]
Merge branch 'upstream-linus' of git://git./linux/kernel/git/jlbec/ocfs2

* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
  ocfs2: ocfs2_write_begin_nolock() should handle len=0
  ocfs2: invalidate dentry if its dentry_lock isn't initialized.

8 years agopty: don't limit the writes to 'pty_space()' inside 'pty_write()'
Linus Torvalds [Sat, 5 Sep 2009 20:27:10 +0000 (13:27 -0700)]
pty: don't limit the writes to 'pty_space()' inside 'pty_write()'

The whole write-room thing is something that is up to the _caller_ to
worry about, not the pty layer itself.  The total buffer space will
still be limited by the buffering routines themselves, so there is no
advantage or need in having pty_write() artificially limit the size

And what happened was that the caller (the n_tty line discipline, in
this case) may have verified that there is room for 2 bytes to be
written (for NL -> CRNL expansion), and it used to then do those writes
as two single-byte writes.  And if the first byte written (CR) then
caused a new tty buffer to be allocated, pty_space() may have returned
zero when trying to write the second byte (LF), and then incorrectly
failed the write - leading to a lost newline character.

This should finally fix


Reported-by: Mikael Pettersson <mikpe@it.uu.se>
Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agon_tty: do O_ONLCR translation as a single write
Linus Torvalds [Sat, 5 Sep 2009 19:46:07 +0000 (12:46 -0700)]
n_tty: do O_ONLCR translation as a single write

When translating CR to CRNL in the n_tty line discipline, we did it as
two tty_put_char() calls.  Which works, but is stupid, and has caused
problems before too with bad interactions with the write_room() logic.
The generic USB serial driver had that problem, for example.

Now the pty layer had similar issues after being moved to the generic
tty buffering code (in commit d945cb9cce20ac7143c2de8d88b187f62db99bdc:
"pty: Rework the pty layer to use the normal buffering logic").

So stop doing the silly separate two writes, and do it as a single write
instead.  That's what the n_tty layer already does for the space
expansion of tabs (XTABS), and it means that we'll now always have just
a single write for the CRNL to match the single 'tty_write_room()' test,
which hopefully means that the next time somebody screws up buffering,
it won't cause weeks of debugging.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoexec: do not sleep in TASK_TRACED under ->cred_guard_mutex
Oleg Nesterov [Sat, 5 Sep 2009 18:17:13 +0000 (11:17 -0700)]
exec: do not sleep in TASK_TRACED under ->cred_guard_mutex

Tom Horsley reports that his debugger hangs when it tries to read
/proc/pid_of_tracee/maps, this happens since

"mm_for_maps: take ->cred_guard_mutex to fix the race with exec"

commit in 2.6.31.

But the root of the problem lies in the fact that do_execve() path calls
tracehook_report_exec() which can stop if the tracer sets PT_TRACE_EXEC.

The tracee must not sleep in TASK_TRACED holding this mutex.  Even if we
remove ->cred_guard_mutex from mm_for_maps() and proc_pid_attr_write(),
another task doing PTRACE_ATTACH should not hang until it is killed or the
tracee resumes.

With this patch do_execve() does not use ->cred_guard_mutex directly and
we do not hold it throughout, instead:

- introduce prepare_bprm_creds() helper, it locks the mutex
  and calls prepare_exec_creds() to initialize bprm->cred.

- install_exec_creds() drops the mutex after commit_creds(),
  and thus before tracehook_report_exec()->ptrace_stop().

  or, if exec fails,

  free_bprm() drops this mutex when bprm->cred != NULL which
  indicates install_exec_creds() was not called.

Reported-by: Tom Horsley <tom.horsley@att.net>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agopage-allocator: always change pageblock ownership when anti-fragmentation is disabled
Mel Gorman [Sat, 5 Sep 2009 18:17:11 +0000 (11:17 -0700)]
page-allocator: always change pageblock ownership when anti-fragmentation is disabled

On low-memory systems, anti-fragmentation gets disabled as fragmentation
cannot be avoided on a sufficiently large boundary to be worthwhile.  Once
disabled, there is a period of time when all the pageblocks are marked
MOVABLE and the expectation is that they get marked UNMOVABLE at each call
to __rmqueue_fallback().

However, when MAX_ORDER is large the pageblocks do not change ownership
because the normal criteria are not met.  This has the effect of
prematurely breaking up too many large contiguous blocks.  This is most
serious on NOMMU systems which depend on high-order allocations to boot.
This patch causes pageblocks to change ownership on every fallback when
anti-fragmentation is disabled.  This prevents the large blocks being
prematurely broken up.

This is a fix to commit 49255c619fbd482d704289b5eb2795f8e3b7ff2e [page
allocator: move check for disabled anti-fragmentation out of fastpath] and
the problem affects 2.6.31-rc8.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Tested-by: Paul Mundt <lethal@linux-sh.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Greg Ungerer <gerg@snapgear.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agonommu: fix error handling in do_mmap_pgoff()
David Howells [Sat, 5 Sep 2009 18:17:07 +0000 (11:17 -0700)]
nommu: fix error handling in do_mmap_pgoff()

Fix the error handling in do_mmap_pgoff().  If do_mmap_shared_file() or
do_mmap_private() fail, we jump to the error_put_region label at which
point we cann __put_nommu_region() on the region - but we haven't yet
added the region to the tree, and so __put_nommu_region() may BUG
because the region tree is empty or it may corrupt the region tree.

To get around this, we can afford to add the region to the region tree
before calling do_mmap_shared_file() or do_mmap_private() as we keep
nommu_region_sem write-locked, so no-one can race with us by seeing a
transient region.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Acked-by: Greg Ungerer <gerg@snapgear.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoworkqueues: introduce __cancel_delayed_work()
Oleg Nesterov [Sat, 5 Sep 2009 18:17:06 +0000 (11:17 -0700)]
workqueues: introduce __cancel_delayed_work()

cancel_delayed_work() has to use del_timer_sync() to guarantee the timer
function is not running after return.  But most users doesn't actually
need this, and del_timer_sync() has problems: it is not useable from
interrupt, and it depends on every lock which could be taken from irq.

Introduce __cancel_delayed_work() which calls del_timer() instead.

The immediate reason for this patch is
but hopefully this helper makes sense anyway.

As for 13757 bug, actually we need requeue_delayed_work(), but its
semantics are not yet clear.

Merge this patch early to resolves cross-tree interdependencies between
input and infiniband.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Roland Dreier <rdreier@cisco.com>
Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoocfs2: ocfs2_write_begin_nolock() should handle len=0
Sunil Mushran [Fri, 4 Sep 2009 18:12:01 +0000 (11:12 -0700)]
ocfs2: ocfs2_write_begin_nolock() should handle len=0

Bug introduced by mainline commit e7432675f8ca868a4af365759a8d4c3779a3d922
The bug causes ocfs2_write_begin_nolock() to oops when len=0.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Cc: stable@kernel.org
Signed-off-by: Joel Becker <joel.becker@oracle.com>
8 years agodm snapshot: fix on disk chunk size validation
Mikulas Patocka [Fri, 4 Sep 2009 19:40:43 +0000 (20:40 +0100)]
dm snapshot: fix on disk chunk size validation

Fix some problems seen in the chunk size processing when activating a
pre-existing snapshot.

For a new snapshot, the chunk size can either be supplied by the creator
or a default value can be used.  For an existing snapshot, the
chunk size in the snapshot header on disk should always be used.

If someone attempts to load an existing snapshot and has the 'default
chunk size' option set, the kernel uses its default value even when it
is incorrect for the snapshot being loaded.  This patch ensures the
correct on-disk value is always used.

Secondly, when the code does use the chunk size stored on the disk it is
prudent to revalidate it, so the code can exit cleanly if it got
corrupted as happened in
https://bugzilla.redhat.com/show_bug.cgi?id=461506 .

Cc: stable@kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
8 years agodm exception store: split set_chunk_size
Mikulas Patocka [Fri, 4 Sep 2009 19:40:41 +0000 (20:40 +0100)]
dm exception store: split set_chunk_size

Break the function set_chunk_size to two functions in preparation for
the fix in the following patch.

Cc: stable@kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
8 years agodm snapshot: fix header corruption race on invalidation
Mikulas Patocka [Fri, 4 Sep 2009 19:40:39 +0000 (20:40 +0100)]
dm snapshot: fix header corruption race on invalidation

If a persistent snapshot fills up, a race can corrupt the on-disk header
which causes a crash on any future attempt to activate the snapshot
(typically while booting).  This patch fixes the race.

When the snapshot overflows, __invalidate_snapshot is called, which calls
snapshot store method drop_snapshot. It goes to persistent_drop_snapshot that
calls write_header. write_header constructs the new header in the "area"

Concurrently, an existing kcopyd job may finish, call copy_callback
and commit_exception method, that goes to persistent_commit_exception.
persistent_commit_exception doesn't do locking, relying on the fact that
callbacks are single-threaded, but it can race with snapshot invalidation and
overwrite the header that is just being written while the snapshot is being

The result of this race is a corrupted header being written that can
lead to a crash on further reactivation (if chunk_size is zero in the
corrupted header).

The fix is to use separate memory areas for each.

See the bug: https://bugzilla.redhat.com/show_bug.cgi?id=461506

Cc: stable@kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
8 years agodm snapshot: refactor zero_disk_area to use chunk_io
Mikulas Patocka [Fri, 4 Sep 2009 19:40:37 +0000 (20:40 +0100)]
dm snapshot: refactor zero_disk_area to use chunk_io

Refactor chunk_io to prepare for the fix in the following patch.

Pass an area pointer to chunk_io and simplify zero_disk_area to use
chunk_io.  No functional change.

Cc: stable@kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
8 years agodm log: userspace add luid to distinguish between concurrent log instances
Jonathan Brassow [Fri, 4 Sep 2009 19:40:34 +0000 (20:40 +0100)]
dm log: userspace add luid to distinguish between concurrent log instances

Device-mapper userspace logs (like the clustered log) are
identified by a universally unique identifier (UUID).  This
identifier is used to associate requests from the kernel to
a specific log in userspace.  The UUID must be unique everywhere,
since multiple machines may use this identifier when communicating
about a particular log, as is the case for cluster logs.

Sometimes, device-mapper/LVM may re-use a UUID.  This is the
case during pvmoves, when moving from one segment of an LV
to another, or when resizing a mirror, etc.  In these cases,
a new log is created with the same UUID and loaded in the
"inactive" slot.  When a device-mapper "resume" is issued,
the "live" table is deactivated and the new "inactive" table
becomes "live".  (The "inactive" table can also be removed
via a device-mapper 'clear' command.)

The above two issues were colliding.  More than one log was being
created with the same UUID, and there was no way to distinguish
between them.  So, sometimes the wrong log would be swapped
out during the exchange.

The solution is to create a locally unique identifier,
'luid', to go along with the UUID.  This new identifier is used
to determine exactly which log is being referenced by the kernel
when the log exchange is made.  The identifier is not
universally safe, but it does not need to be, since
create/destroy/suspend/resume operations are bound to a specific
machine; and these are the operations that make up the exchange.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
8 years agodm raid1: do not allow log_failure variable to unset after being set
Jonathan Brassow [Fri, 4 Sep 2009 19:40:32 +0000 (20:40 +0100)]
dm raid1: do not allow log_failure variable to unset after being set

This patch fixes a bug which was triggering a case where the primary leg
could not be changed on failure even when the mirror was in-sync.

The case involves the failure of the primary device along with
the transient failure of the log device.  The problem is that
bios can be put on the 'failures' list (due to log failure)
before 'fail_mirror' is called due to the primary device failure.
Normally, this is fine, but if the log device failure is transient,
a subsequent iteration of the work thread, 'do_mirror', will
reset 'log_failure'.  The 'do_failures' function then resets
the 'in_sync' variable when processing bios on the failures list.
The 'in_sync' variable is what is used to determine if the
primary device can be switched in the event of a failure.  Since
this has been reset, the primary device is incorrectly assumed
to be not switchable.

The case has been seen in the cluster mirror context, where one
machine realizes the log device is dead before the other machines.
As the responsibilities of the server migrate from one node to
another (because the mirror is being reconfigured due to the failure),
the new server may think for a moment that the log device is fine -
thus resetting the 'log_failure' variable.

In any case, it is inappropiate for us to reset the 'log_failure'
variable.  The above bug simply illustrates that it can actually
hurt us.

Cc: stable@kernel.org
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
8 years agodm log: remove incorrect field from userspace table output
Jonathan Brassow [Fri, 4 Sep 2009 19:40:30 +0000 (20:40 +0100)]
dm log: remove incorrect field from userspace table output

The output of 'dmsetup table' includes an internal field that should not
be there.  This patch removes it.  To make the fix simpler, we first
reorder a constructor argument

The 'device size' argument is generated internally.  Currently it is
placed as the last space-separated word of the constructor string.
However, we need to use a version of the string without this word, so we
move it to the beginning instead so it is trivial to skip past it.

We keep a copy of the arguments passed to userspace for creating a log,
just in case we need to resend them.  These are the same arguments that
are desired in the STATUSTYPE_TABLE request, except for one.  When
creating the userspace log, the userspace daemon must know the size of
the mirror, so that is added to the arguments given in the constructor
table.  We were printing this extra argument out as well, which is a

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
8 years agodm log: fix userspace status output
Jonathan Brassow [Fri, 4 Sep 2009 19:40:28 +0000 (20:40 +0100)]
dm log: fix userspace status output

Fix 'dmsetup table' output.

There is a missing ' ' at the end of the string causing two
words to run together.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
8 years agodm stripe: expose correct io hints
Mike Snitzer [Fri, 4 Sep 2009 19:40:25 +0000 (20:40 +0100)]
dm stripe: expose correct io hints

Set sensible I/O hints for striped DM devices in the topology
infrastructure added for 2.6.31 for userspace tools to
obtain via sysfs.

Add .io_hints to 'struct target_type' to allow the I/O hints portion
(io_min and io_opt) of the 'struct queue_limits' to be set by each
target and implement this for dm-stripe.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
8 years agodm table: add more context to terse warning messages
Mike Snitzer [Fri, 4 Sep 2009 19:40:24 +0000 (20:40 +0100)]
dm table: add more context to terse warning messages

A couple of recent warning messages make it difficult for the reader to
determine exactly what is wrong.  This patch adds more information to
those messages.

The messages were added by these commits:
  5dea271b6d87bd1d79a59c1d5baac2596a841c37 ("dm table: pass correct dev area size
to device_area_is_valid")
  ea9df47cc92573b159ef3b4fda516c32cba9c4fd ("dm table: fix blk_stack_limits arg
to use bytes not sectors")

The patch also corrects references to logical_block_size in printk format
strings from %hu to %u.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
8 years agodm table: fix queue_limit checking device iterator
Mikulas Patocka [Fri, 4 Sep 2009 19:40:22 +0000 (20:40 +0100)]
dm table: fix queue_limit checking device iterator

The logic to check for valid device areas is inverted relative to proper
use with iterate_devices.

The iterate_devices method calls its callback for every underlying
device in the target.  If any callback returns non-zero, iterate_devices
exits immediately.  But the callback device_area_is_valid() returns 0 on
error and 1 on success.  The overall effect without is that an error is
issued only if every device is invalid.

This patch renames device_area_is_valid to device_area_is_invalid and
inverts the logic so that one invalid device is sufficient to raise
an error.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
8 years agodm snapshot: implement iterate devices
Mike Snitzer [Fri, 4 Sep 2009 19:40:19 +0000 (20:40 +0100)]
dm snapshot: implement iterate devices

Implement the .iterate_devices for the origin and snapshot targets.
dm-snapshot's lack of .iterate_devices resulted in the inability to
properly establish queue_limits for both targets.

With 4K sector drives: an unfortunate side-effect of not establishing
proper limits in either targets' DM device was that IO to the devices
would fail even though both had been created without error.

Commit af4874e03ed82f050d5872d8c39ce64bf16b5c38 ("dm target:s introduce
iterate devices fn") in 2.6.31-rc1 should have implemented .iterate_devices
for dm-snap.c's origin and snapshot targets.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
8 years agodm multipath: fix oops when request based io fails when no paths
Kiyoshi Ueda [Fri, 4 Sep 2009 19:40:16 +0000 (20:40 +0100)]
dm multipath: fix oops when request based io fails when no paths

The patch posted at http://marc.info/?l=dm-devel&m=124539787228784&w=2
which was merged into cec47e3d4a861e1d942b3a580d0bbef2700d2bb2 ("dm:
prepare for request based option") introduced a regression in
request-based dm.

If map_request() calls dm_kill_unmapped_request() to complete a cloned
bio without dispatching it, clone->bio is still set when
dm_end_request() is called and the BUG_ON(clone->bio) is incorrect.

The patch fixes this bug by freeing bio in dm_end_request() if the clone
has bio.  I've redone my tests to cover all I/O paths and confirmed
there's no other regression.

Here is the oops I hit in request-based dm when I do I/O to a multipath
device which doesn't have any active path nor queue_if_no_path setting:

------------[ cut here ]------------
kernel BUG at /root/2.6.31-rc4.rqdm/drivers/md/dm.c:828!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map
Modules linked in: autofs4 sunrpc cpufreq_ondemand acpi_cpufreq dm_mirror dm_region_hash dm_log dm_service_time dm_multipath scsi_dh dm_mod video output sbs sbshc battery ac sg sr_mod e1000e button cdrom serio_raw rtc_cmos rtc_core rtc_lib piix lpfc scsi_transport_fc ata_piix libata megaraid_sas sd_mod scsi_mod crc_t10dif ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode]
Pid: 7, comm: ksoftirqd/1 Not tainted 2.6.31-rc4.rqdm #1 Express5800/120Lj [N8100-1417]
RIP: 0010:[<ffffffffa023629d>]  [<ffffffffa023629d>] dm_softirq_done+0xbd/0x100 [dm_mod]
RSP: 0018:ffff8800280a1f08  EFLAGS: 00010282
RAX: ffffffffa02544e0 RBX: ffff8802aa1111d0 RCX: ffff8802aa1111e0
RDX: ffff8802ab913e70 RSI: 0000000000000000 RDI: ffff8802ab913e70
RBP: ffff8800280a1f28 R08: ffffc90005457040 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: 00000000fffffffb
R13: ffff8802ab913e88 R14: ffff8802ab9c1438 R15: 0000000000000100
FS:  0000000000000000(0000) GS:ffff88002809e000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000003d54a98640 CR3: 000000029f0a1000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ksoftirqd/1 (pid: 7, threadinfo ffff8802ae50e000, task ffff8802ae4f8040)
 ffff8800280a1f38 0000000000000020 ffffffff814f30a0 0000000000000004
<0> ffff8800280a1f58 ffffffff8116b245 ffff8800280a1f38 ffff8800280a1f38
<0> ffff8800280a1f58 0000000000000001 ffff8800280a1fa8 ffffffff810477bc
Call Trace:
 [<ffffffff8116b245>] blk_done_softirq+0x75/0x90
 [<ffffffff810477bc>] __do_softirq+0xcc/0x210
 [<ffffffff81047170>] ? ksoftirqd+0x0/0x110
 [<ffffffff8100ce7c>] call_softirq+0x1c/0x50
 [<ffffffff8100e785>] do_softirq+0x65/0xa0
 [<ffffffff81047170>] ? ksoftirqd+0x0/0x110
 [<ffffffff810471e0>] ksoftirqd+0x70/0x110
 [<ffffffff81059559>] kthread+0x99/0xb0
 [<ffffffff8100cd7a>] child_rip+0xa/0x20
 [<ffffffff8100c73c>] ? restore_args+0x0/0x30
 [<ffffffff810594c0>] ? kthread+0x0/0xb0
 [<ffffffff8100cd70>] ? child_rip+0x0/0x20
Code: 44 89 e6 48 89 df e8 23 fb f2 e0 be 01 00 00 00 4c 89 f7 e8 f6 fd ff ff 5b 41 5c 41 5d 41 5e c9 c3 4c 89 ef e8 85 fe ff ff eb ed <0f> 0b eb fe 41 8b 85 dc 00 00 00 48 83 bb 10 01 00 00 00 89 83
RIP  [<ffffffffa023629d>] dm_softirq_done+0xbd/0x100 [dm_mod]
 RSP <ffff8800280a1f08>
---[ end trace 16af0a1d8542da55 ]---

Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
8 years agosparc64: Fix bootup with mcount in some configs.
David S. Miller [Fri, 4 Sep 2009 10:38:54 +0000 (03:38 -0700)]
sparc64: Fix bootup with mcount in some configs.

Functions invoked early when booting up a cpu can't use
tracing because mcount requires a valid 'current_thread_info()'
and TLB mappings to be setup.

The code path of sun4v_register_mondo_queues --> register_one_mondo
is one such case.  sun4v_register_mondo_queues already has the
necessary 'notrace' annotation, but register_one_mondo does not.

Normally register_one_mondo is inlined so the bug doesn't trigger,
but with some config/compiler combinations, it won't be so we
must properly mark it notrace.

While we're here, add 'notrace' annoations to prom_printf and
prom_halt so that early error handling won't have the same problem.

Reported-by: Alexander Beregalov <a.beregalov@gmail.com>
Reported-by: Leif Sawyer <lsawyer@gci.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoInput: atkbd - add Compaq Presario R4000-series repeat quirk
Dave Andrews [Fri, 4 Sep 2009 00:21:27 +0000 (17:21 -0700)]
Input: atkbd - add Compaq Presario R4000-series repeat quirk

Compaq Presario R4000-series laptops are not sending a "volume up button
release" and "volume down button release" signal in the PS/2 protocol for
atkbd. The URL below has some of confirmed reports:


Signed-off-by: Dave Andrews <jetdog330@hotmail.com>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
8 years agoslub: Fix kmem_cache_destroy() with SLAB_DESTROY_BY_RCU
Eric Dumazet [Thu, 3 Sep 2009 19:38:59 +0000 (22:38 +0300)]
slub: Fix kmem_cache_destroy() with SLAB_DESTROY_BY_RCU

kmem_cache_destroy() should call rcu_barrier() *after* kmem_cache_close() and
*before* sysfs_slab_remove() or risk rcu_free_slab() being called after
kmem_cache is deleted (kfreed).

rmmod nf_conntrack can crash the machine because it has to kmem_cache_destroy()
a SLAB_DESTROY_BY_RCU enabled cache.

Cc: <stable@kernel.org>
Reported-by: Zdenek Kabelac <zdenek.kabelac@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
8 years agosparc64: Kill spurious NMI watchdog triggers by increasing limit to 30 seconds.
David S. Miller [Thu, 3 Sep 2009 09:35:20 +0000 (02:35 -0700)]
sparc64: Kill spurious NMI watchdog triggers by increasing limit to 30 seconds.

This is a compromise and a temporary workaround for bootup NMI
watchdog triggers some people see with qla2xxx devices present.

This happens when, for example:

CPU 0 is in the driver init and looping submitting mailbox commands to
load the firmware, then waiting for completion.

CPU 1 is receiving the device interrupts.  CPU 1 is where the NMI
watchdog triggers.

CPU 0 is submitting mailbox commands fast enough that by the time CPU
1 returns from the device interrupt handler, a new one is pending.
This sequence runs for more than 5 seconds.

The problematic case is CPU 1's timer interrupt running when the
barrage of device interrupts begin.  Then we have:

timer interrupt
return for softirq checking
pending, thus enable interrupts

 qla2xxx interrupt
 qla2xxx interrupt
 ... 5+ seconds pass
 final qla2xxx interrupt for fw load

run timer softirq

At some point in the multi-second qla2xxx interrupt storm we trigger
the NMI watchdog on CPU 1 from the NMI interrupt handler.

The timer softirq, once we get back to running it, is smart enough to
run the timer work enough times to make up for the missed timer

However, the NMI watchdogs (both x86 and sparc) use the timer
interrupt count to notice the cpu is wedged.  But in the above
scenerio we'll receive only one such timer interrupt even if we last
all the way back to running the timer softirq.

The default watchdog trigger point is only 5 seconds, which is pretty
low (the softwatchdog triggers at 60 seconds).  So increase it to 30
seconds for now.

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoperf_counter/powerpc: Fix cache event codes for POWER7
Paul Mackerras [Thu, 3 Sep 2009 01:52:02 +0000 (11:52 +1000)]
perf_counter/powerpc: Fix cache event codes for POWER7

I had the codes for L1 D-cache load accesses and misses swapped
around, and the wrong codes for LL-cache accesses and misses.
This corrects them.

Reported-by: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <stable@kernel.org>
LKML-Reference: <19103.8514.709300.585484@cargo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
8 years agoautofs4 - fix missed case when changing to use struct path
Ian Kent [Tue, 1 Sep 2009 03:26:22 +0000 (11:26 +0800)]
autofs4 - fix missed case when changing to use struct path

In the recent change by Al Viro that changes verious subsystems
to use "struct path" one case was missed in the autofs4 module
which causes mounts to no longer expire.

Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoMerge branch 'fix/hda' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
Linus Torvalds [Tue, 1 Sep 2009 03:36:10 +0000 (17:36 -1000)]
Merge branch 'fix/hda' of git://git./linux/kernel/git/tiwai/sound-2.6

* 'fix/hda' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
  ALSA: hda - Fix MacBookPro 3,1/4,1 quirk with ALC889A
  ALSA: hda - Add missing mux check for VT1708

8 years agoMerge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab...
Linus Torvalds [Tue, 1 Sep 2009 03:31:02 +0000 (17:31 -1000)]
Merge branch 'for_linus' of git://git./linux/kernel/git/mchehab/linux-2.6

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6:
  V4L/DVB (12564a): MAINTAINERS: Update gspca sn9c20x name style
  V4L/DVB (12502): gspca - sn9c20x: Fix gscpa sn9c20x build errors.
  V4L/DVB (12495): em28xx: Don't call em28xx_ir_init when disable_ir is true
  V4L/DVB (12457): zr364: wrong indexes
  V4L/DVB (12451): Update KConfig File to enable SDIO and USB interfaces
  V4L/DVB (12450): Siano: Fixed SDIO compilation bugs
  V4L/DVB (12449): adds webcam for Micron device MT9M111 0x143A to em28xx
  V4L/DVB (12446): sms1xxx: restore GPIO functionality for all Hauppauge devices

8 years agolmb: Also remove __init from lmb_end_of_RAM() declaration in lmb.h
Benjamin Herrenschmidt [Mon, 31 Aug 2009 03:48:16 +0000 (13:48 +1000)]
lmb: Also remove __init from lmb_end_of_RAM() declaration in lmb.h

My previous patch (commit 4f8ee2c9cc: "lmb: Remove __init from
lmb_end_of_DRAM()") removed __init in lmb.c but missed the fact that it
was also marked as such in the .h

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoata_piix: parallel scanning on PATA needs an extra locking
Bartlomiej Zolnierkiewicz [Sun, 30 Aug 2009 12:56:30 +0000 (14:56 +0200)]
ata_piix: parallel scanning on PATA needs an extra locking

Commit log for commit 517d3cc15b36392e518abab6bacbb72089658313
("[libata] ata_piix: Enable parallel scan") says:

    This patch turns on parallel scanning for the ata_piix driver.
    This driver is used on most netbooks (no AHCI for cheap storage it seems).
    The scan is the dominating time factor in the kernel boot for these
    devices; with this flag it gets cut in half for the device I used
    for testing (eeepc).
    Alan took a look at the driver source and concluded that it ought to be safe
    to do for this driver.  Alan has also checked with the hardware team.

and it is all true but once we put all things together additional
constraints for PATA controllers show up (some hardware registers
have per-host not per-port atomicity) and we risk misprogramming
the controller.

I used the following test to check whether the issue is real:

  @@ -736,8 +736,20 @@ static void piix_set_piomode(struct ata_
    (timings[pio][1] << 8);
    pci_write_config_word(dev, master_port, master_data);
  - if (is_slave)
  + if (is_slave) {
  + if (ap->port_no == 0) {
  + u8 tmp = slave_data;
  + while (slave_data == tmp) {
  + pci_read_config_byte(dev, slave_port, &tmp);
  + msleep(50);
  + }
  + dev_printk(KERN_ERR, &dev->dev, "PATA parallel scan "
  +    "race detected\n");
  + }
    pci_write_config_byte(dev, slave_port, slave_data);
  + }

    /* Ensure the UDMA bit is off - it will be turned back on if
       UDMA is selected */

and it indeed triggered the error message.

Lets fix all such races by adding an extra locking to ->set_piomode
and ->set_dmamode methods for PATA controllers.

[ Alan: would be better to take the host lock in libata-core for these
  cases so that we fix all the adapters in one swoop.  "Looks fine as a
  temproary quickfix tho" ]

Cc: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Alan Cox <alan@linux.intel.com>
Cc: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/anholt...
Linus Torvalds [Tue, 1 Sep 2009 03:22:10 +0000 (17:22 -1000)]
Merge branch 'for-linus' of git://git./linux/kernel/git/anholt/drm-intel

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel:
  drm/i915: Improve CRTDDC mapping by using VBT info
  drm/i915: Fix CPU-spinning hangs related to fence usage by using an LRU.
  drm/i915: Set crtc/clone mask in different output devices
  drm/i915: Always use SDVO_B detect bit for SDVO output detection.
  drm/i915: Fix typo that broke SVID1 in intel_sdvo_multifunc_encoder()
  drm/i915: Check if BIOS enabled dual-channel LVDS on 8xx, not only on 9xx
  drm/i915: Set the multiplier for SDVO on G33 platform

8 years agoALSA: hda - Fix MacBookPro 3,1/4,1 quirk with ALC889A
Takashi Iwai [Mon, 31 Aug 2009 06:15:26 +0000 (08:15 +0200)]
ALSA: hda - Fix MacBookPro 3,1/4,1 quirk with ALC889A

This patch fixes the wrong headphone output routing for MacBookPro 3,1/4,1
quirk with ALC889A codec, which caused the silent headphone output.
Also, this gives the individual Headphone and Speaker volume controls.

Reference: kernel bug#14078

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Cc: <stable@kernel.org>
8 years agoALSA: hda - Add missing mux check for VT1708
Takashi Iwai [Mon, 31 Aug 2009 06:12:29 +0000 (08:12 +0200)]
ALSA: hda - Add missing mux check for VT1708

In patch_vt1708(), the check of MUX nids is missing and this results in
the -EINVAL error in accessing Input Source mixer element.  Simpliy
adding the call of get_mux_nids() fixes the problem.

Reference: Novell bnc#534904

Signed-off-by: Takashi Iwai <tiwai@suse.de>
8 years agoV4L/DVB (12564a): MAINTAINERS: Update gspca sn9c20x name style
Joe Perches [Sun, 16 Aug 2009 23:03:51 +0000 (20:03 -0300)]
V4L/DVB (12564a): MAINTAINERS: Update gspca sn9c20x name style

To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
8 years agoV4L/DVB (12502): gspca - sn9c20x: Fix gscpa sn9c20x build errors.
Randy Dunlap [Wed, 26 Aug 2009 06:34:16 +0000 (03:34 -0300)]
V4L/DVB (12502): gspca - sn9c20x: Fix gscpa sn9c20x build errors.

Reported-by: Toralf Forster <toralf.foerster@gmx.de>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Jean-Francois Moine <moinejf@free.fr>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
8 years agoV4L/DVB (12495): em28xx: Don't call em28xx_ir_init when disable_ir is true
Shine Liu [Fri, 21 Aug 2009 02:49:26 +0000 (23:49 -0300)]
V4L/DVB (12495): em28xx: Don't call em28xx_ir_init when disable_ir is true

We should call em28xx_ir_init(dev) only when disable_ir is true.

Signed-off-by: Shine Liu <shinel@foxmail.com>
Reviewed-by: Devin Heitmueller <dheitmueller@kernellabs.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
8 years agoV4L/DVB (12457): zr364: wrong indexes
Roel Kluin [Tue, 11 Aug 2009 11:10:25 +0000 (08:10 -0300)]
V4L/DVB (12457): zr364: wrong indexes

The order of indexes is reversed

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Antoine Jacquet <royale@zerezo.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
8 years agoV4L/DVB (12451): Update KConfig File to enable SDIO and USB interfaces
Udi Atar [Thu, 13 Aug 2009 19:30:25 +0000 (16:30 -0300)]
V4L/DVB (12451): Update KConfig File to enable SDIO and USB interfaces

Update KConfig file to enbale selection of SDIO and USB
interfaces, and add dependancy on relevant modules.

[mchehab@redhat.com: fix merge conflicts, remove default: m, add missing endmenu]

Signed-off-by: Udi Atar <udia@siano-ms.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
8 years agoV4L/DVB (12450): Siano: Fixed SDIO compilation bugs
Udi Atar [Sun, 28 Jun 2009 07:22:55 +0000 (04:22 -0300)]
V4L/DVB (12450): Siano: Fixed SDIO compilation bugs

Fixed SDIO compilation bugs
Also fixed a memory overrun issue in buffer management.

Signed-off-by: Udi Atar <udia@siano-ms.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
8 years agoV4L/DVB (12449): adds webcam for Micron device MT9M111 0x143A to em28xx
Mauro Carvalho Chehab [Wed, 12 Aug 2009 23:21:44 +0000 (20:21 -0300)]
V4L/DVB (12449): adds webcam for Micron device MT9M111 0x143A to em28xx

[mchehab@redhat.com: fix merge conflict and a few CodingStyle issues]
Signed-off-by: Steve Gotthardt <gotthardt@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
8 years agoV4L/DVB (12446): sms1xxx: restore GPIO functionality for all Hauppauge devices
Michael Krufky [Mon, 13 Jul 2009 02:30:14 +0000 (23:30 -0300)]
V4L/DVB (12446): sms1xxx: restore GPIO functionality for all Hauppauge devices

Previous changesets broke Hauppauge devices and their GPIO configurations.

This changeset restores the LED & LNA functionality.

Signed-off-by: Michael Krufky <mkrufky@kernellabs.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
8 years agoPCI SR-IOV: correct broken resource alignment calculations
Chris Wright [Fri, 28 Aug 2009 20:00:06 +0000 (13:00 -0700)]
PCI SR-IOV: correct broken resource alignment calculations

An SR-IOV capable device includes an SR-IOV PCIe capability which
describes the Virtual Function (VF) BAR requirements.  A typical SR-IOV
device can support multiple VFs whose BARs must be in a contiguous region,
effectively an array of VF BARs.  The BAR reports the size requirement
for a single VF.  We calculate the full range needed by simply multiplying
the VF BAR size with the number of possible VFs and create a resource
spanning the full range.

This all seems sane enough except it artificially inflates the alignment
requirement for the VF BAR.  The VF BAR need only be aligned to the size
of a single BAR not the contiguous range of VF BARs.  This can cause us
to fail to allocate resources for the BAR despite the fact that we
actually have enough space.

This patch adds a thin PCI specific layer over the generic
resource_alignment() function which is aware of the special nature of
VF BARs and does sorting and allocation based on the smaller alignment

I recognize that while resource_alignment is generic, it's basically a
PCI helper.  An alternative to this patch is to add PCI VF BAR specific
information to struct resource.  I opted for the extra layer rather than
adding such PCI specific information to struct resource.  This does
have the slight downside that we don't cache the BAR size and re-read
for each alignment query (happens a small handful of times during boot
for each VF BAR).

Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Yu Zhao <yu.zhao@intel.com>
Cc: stable@kernel.org
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
8 years agodrm/i915: Improve CRTDDC mapping by using VBT info
David Müller (ELSOFT AG) [Sat, 29 Aug 2009 06:54:45 +0000 (08:54 +0200)]
drm/i915: Improve CRTDDC mapping by using VBT info

Use VBT information to determine which DDC bus to use for CRTDCC.
Fall back to GPIOA if VBT info is not available.

Signed-off-by: David Müller <d.mueller@elsoft.ch>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
Tested on: 855 (David), and 945GM, 965GM, GM45, and G45 (anholt)

8 years agodrm/i915: Fix CPU-spinning hangs related to fence usage by using an LRU.
Eric Anholt [Sat, 29 Aug 2009 19:49:51 +0000 (12:49 -0700)]
drm/i915: Fix CPU-spinning hangs related to fence usage by using an LRU.

The lack of a proper LRU was partially worked around by taking the fence
from the object containing the oldest seqno.  But if there are multiple
objects inactive, then they don't have seqnos and the first fence reg
among them would be chosen.  If you were trying to copy data between two
mappings, this could result in each page fault stealing the fence from
the other argument, and your application hanging.


Cc: Stable Team <stable@kernel.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
8 years agoperf_counter: Fix /0 bug in swcounters
Peter Zijlstra [Fri, 28 Aug 2009 15:10:47 +0000 (17:10 +0200)]
perf_counter: Fix /0 bug in swcounters

We have a race in the swcounter stuff where we can start
counting a counter that has never been enabled, this leads to a
/0 situation.

The below avoids the /0 but doesn't close the race, this would
need a new counter state.

The race is due to perf_swcounter_is_counting() which cannot
discern between disabled due to scheduled out, and disabled for
any other reason.

Such a crash has been seen by Ingo:

[  967.092372] divide error: 0000 [#1] SMP
[  967.096499] last sysfs file: /sys/devices/system/cpu/cpu15/cache/index2/shared_cpu_map
[  967.104846] CPU 5
[  967.106965] Modules linked in:
[  967.110169] Pid: 3351, comm: hackbench Not tainted 2.6.31-rc8-tip-01158-gd940a54-dirty #1568 X8DTN
[  967.119456] RIP: 0010:[<ffffffff810c0aba>]  [<ffffffff810c0aba>] perf_swcounter_ctx_event+0x127/0x1af
[  967.129137] RSP: 0018:ffff8801a95abd70  EFLAGS: 00010046
[  967.134699] RAX: 0000000000000002 RBX: ffff8801bd645c00 RCX: 0000000000000002
[  967.142162] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8801bd645d40
[  967.149584] RBP: ffff8801a95abdb0 R08: 0000000000000001 R09: ffff8801a95abe00
[  967.157042] R10: 0000000000000037 R11: ffff8801aa1245f8 R12: ffff8801a95abe00
[  967.164481] R13: ffff8801a95abe00 R14: ffff8801aa1c0e78 R15: 0000000000000001
[  967.171953] FS:  0000000000000000(0000) GS:ffffc90000a00000(0063) knlGS:00000000f7f486c0
[  967.180406] CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
[  967.186374] CR2: 000000004822c0ac CR3: 00000001b19a2000 CR4: 00000000000006e0
[  967.193770] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  967.201224] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  967.208692] Process hackbench (pid: 3351, threadinfo ffff8801a95aa000, task ffff8801a96b0000)
[  967.217607] Stack:
[  967.219711]  0000000000000000 0000000000000037 0000000200000001 ffffc90000a1107c
[  967.227296] <0> ffff8801a95abe00 0000000000000001 0000000000000001 0000000000000037
[  967.235333] <0> ffff8801a95abdf0 ffffffff810c0c20 0000000200a14f30 ffff8801a95abe40
[  967.243532] Call Trace:
[  967.246103]  [<ffffffff810c0c20>] do_perf_swcounter_event+0xde/0xec
[  967.252635]  [<ffffffff810c0ca7>] perf_tpcounter_event+0x79/0x7b
[  967.258957]  [<ffffffff81037f73>] ftrace_profile_sched_switch+0xc0/0xcb
[  967.265791]  [<ffffffff8155f22d>] schedule+0x429/0x4c4
[  967.271156]  [<ffffffff8100c01e>] int_careful+0xd/0x14

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1251472247.17617.74.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
8 years agoMerge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux...
Linus Torvalds [Sat, 29 Aug 2009 05:41:05 +0000 (19:41 -1000)]
Merge branch 'release' of git://git./linux/kernel/git/lenb/linux-acpi-2.6

* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6:
  ACPI: don't free non-existent backlight in acpi video module
  toshiba_acpi: return on a fail path
  ACPICA: Windows compatibility fix: same buffer/string store

8 years agoMerge branch 'for-linus' of git://git.infradead.org/users/eparis/notify
Linus Torvalds [Sat, 29 Aug 2009 05:39:44 +0000 (19:39 -1000)]
Merge branch 'for-linus' of git://git.infradead.org/users/eparis/notify

* 'for-linus' of git://git.infradead.org/users/eparis/notify:
  inotify: update the group mask on mark addition
  inotify: fix length reporting and size checking
  inotify: do not send a block of zeros when no pathname is available

8 years agoparisc: fix warning in traps.c
Grant Grundler [Fri, 28 Aug 2009 19:00:36 +0000 (15:00 -0400)]
parisc: fix warning in traps.c

On Tue, Aug 18, 2009 at 01:45:17PM -0400, John David Anglin wrote:
>  CC      arch/parisc/kernel/traps.o
> arch/parisc/kernel/traps.c: In function 'handle_interruption':
> arch/parisc/kernel/traps.c:535:18: warning: operation on 'regs->iasq[0]'
> may be undefined

Yes - Line 535 should use both [0] and [1].

Reported-by: John David Anglin <dave@hiauly1.hia.nrc.ca>
Signed-off-by: Grant Grundler <grundler@parisc-linux.org>
Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoSUNRPC: Fix rpc_task_force_reencode
Trond Myklebust [Fri, 28 Aug 2009 15:12:12 +0000 (11:12 -0400)]
SUNRPC: Fix rpc_task_force_reencode

This patch fixes the bug that was reported in

If we're in the case where we need to force a reencode and then resend of
the RPC request, due to xprt_transmit failing with a networking error, then
we _must_ retransmit the entire request.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agomodules: Fix build error in the !CONFIG_KALLSYMS case
Ingo Molnar [Fri, 28 Aug 2009 08:44:56 +0000 (10:44 +0200)]
modules: Fix build error in the !CONFIG_KALLSYMS case

> James Bottomley (1):
>       module: workaround duplicate section names

-tip testing found that this patch breaks the build on x86 if
CONFIG_KALLSYMS is disabled:

 kernel/module.c: In function ‘load_module’:
 kernel/module.c:2367: error: ‘struct module’ has no member named ‘sect_attrs’
 distcc[8269] ERROR: compile kernel/module.c on ph/32 failed
 make[1]: *** [kernel/module.o] Error 1
 make: *** [kernel] Error 2
 make: *** Waiting for unfinished jobs....

Commit 1b364bf misses the fact that section attributes are only
built and dealt with if kallsyms is enabled. The patch below fixes

( note, technically speaking this should depend on CONFIG_SYSFS as
  well but this patch is correct too and keeps the #ifdef less
  intrusive - in the KALLSYMS && !SYSFS case the code is a NOP. )

Signed-off-by: Ingo Molnar <mingo@elte.hu>
[ Replaced patch with a slightly cleaner variation by James Bottomley ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoMerge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 29 Aug 2009 05:32:32 +0000 (19:32 -1000)]
Merge branch 'x86-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Fix vSMP boot crash
  x86, xen: Initialize cx to suppress warning
  x86, xen: Suppress WP test on Xen

8 years agoACPI: don't free non-existent backlight in acpi video module
Keith Packard [Thu, 6 Aug 2009 22:57:54 +0000 (15:57 -0700)]
ACPI: don't free non-existent backlight in acpi video module

acpi_video_put_one_device was attempting to remove sysfs entries and
unregister a backlight device without first checking that said backlight
device structure had been created.

Signed-off-by: Keith Packard <keithp@keithp.com>
Acked-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
8 years agotoshiba_acpi: return on a fail path
Jiri Slaby [Thu, 6 Aug 2009 22:57:51 +0000 (15:57 -0700)]
toshiba_acpi: return on a fail path

Return from bt_rfkill_poll() when hci_get_radio_state() fails.

value is invalid in that case and should not be assigned to the rfkill

This also fixes a double unlock bug.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: John W. Linville <linville@tuxdriver.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
8 years agoACPICA: Windows compatibility fix: same buffer/string store
Lin Ming [Wed, 26 Aug 2009 01:01:34 +0000 (09:01 +0800)]
ACPICA: Windows compatibility fix: same buffer/string store

Fix a compatibility issue when the same buffer or string is
stored to itself. This has been seen in the field. Previously,
ACPICA would zero out the buffer/string. Now, the operation is
treated as a NOP.


Reported-by: Rezwanul Kabir <Rezwanul_Kabir@Dell.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
8 years agoinotify: update the group mask on mark addition
Eric Paris [Fri, 28 Aug 2009 16:50:47 +0000 (12:50 -0400)]
inotify: update the group mask on mark addition

Seperating the addition and update of marks in inotify resulted in a
regression in that inotify never gets events.  The inotify group mask is
always 0.  This mask should be updated any time a new mark is added.

Signed-off-by: Eric Paris <eparis@redhat.com>
8 years agoinotify: fix length reporting and size checking
Eric Paris [Fri, 28 Aug 2009 15:57:55 +0000 (11:57 -0400)]
inotify: fix length reporting and size checking

0db501bd0610ee0c0 introduced a regresion in that it now sends a nul
terminator but the length accounting when checking for space or
reporting to userspace did not take this into account.  This corrects
all of the rounding logic.

Signed-off-by: Eric Paris <eparis@redhat.com>
8 years agoinotify: do not send a block of zeros when no pathname is available
Brian Rogers [Fri, 28 Aug 2009 14:00:05 +0000 (10:00 -0400)]
inotify: do not send a block of zeros when no pathname is available

When an event has no pathname, there's no need to pad it with a null byte and
therefore generate an inotify_event sized block of zeros. This fixes a
regression introduced by commit 0db501bd0610ee0c0aca84d927f90bcccd09e2bd where
my system wouldn't finish booting because some process was being confused by

Signed-off-by: Brian Rogers <brian@xyzw.org>
Signed-off-by: Eric Paris <eparis@redhat.com>
8 years agoperf_counters: Increase paranoia level
Ingo Molnar [Fri, 28 Aug 2009 11:44:53 +0000 (13:44 +0200)]
perf_counters: Increase paranoia level

Per-cpu counters are an ASLR information leak as they show
the execution other tasks do. Increase the paranoia level
to 1, which disallows per-cpu counters. (they still allow
counting/profiling of own tasks - and admin can profile

Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
8 years agoocfs2: invalidate dentry if its dentry_lock isn't initialized.
Tao Ma [Thu, 27 Aug 2009 06:46:56 +0000 (14:46 +0800)]
ocfs2: invalidate dentry if its dentry_lock isn't initialized.

In commit a5a0a630922a2f6a774b6dac19f70cb5abd86bb0, when
ocfs2_attch_dentry_lock fails, we call an extra iput and reset
dentry->d_fsdata to NULL. This resolve a bug, but it isn't
completed and the dentry is still there. When we want to use
it again, ocfs2_dentry_revalidate doesn't catch it and return
true. That make future ocfs2_dentry_lock panic out.
One bug is http://oss.oracle.com/bugzilla/show_bug.cgi?id=1162.

The resolution is to add a check for dentry->d_fsdata in
revalidate process and return false if dentry->d_fsdata is NULL,
so that a new ocfs2_lookup will be called again.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
8 years agoLinux 2.6.31-rc8
Linus Torvalds [Fri, 28 Aug 2009 00:59:04 +0000 (17:59 -0700)]
Linux 2.6.31-rc8

8 years agomodule: workaround duplicate section names
James Bottomley [Wed, 26 Aug 2009 12:34:12 +0000 (22:04 +0930)]
module: workaround duplicate section names

The root cause is a duplicate section name (.text); is this legal?
[ Amerigo Wang: "AFAIK, yes." ]

However, there's a problem with commit
6d76013381ed28979cd122eb4b249a88b5e384fa in that if you fail to allocate
a mod->sect_attrs (in this case it's null because of the duplication),
it still gets used without checking in add_notes_attrs()

This should fix it

[ This patch leaves other problems, particularly the sections directory,
  but recent parisc toolchains seem to produce these modules and this
  prevents a crash and is a minimal change -- RR ]

Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Tested-by: Helge Deller <deller@gmx.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agomodule: fix BUG_ON() for powerpc (and other function descriptor archs)
Rusty Russell [Wed, 26 Aug 2009 12:32:54 +0000 (22:02 +0930)]
module: fix BUG_ON() for powerpc (and other function descriptor archs)

The rarely-used symbol_put_addr() needs to use dereference_function_descriptor
on powerpc.

Reported-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoxenfb: connect to backend before registering fb
Jeremy Fitzhardinge [Thu, 27 Aug 2009 19:22:43 +0000 (12:22 -0700)]
xenfb: connect to backend before registering fb

As soon as the framebuffer is registered, our methods may be called by the
kernel. This leads to a crash as xenfb_refresh() gets called before we have
the irq.

Connect to the backend before registering our framebuffer with the kernel.

[ Fixes bug http://bugzilla.kernel.org/show_bug.cgi?id=14059 ]

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoMerge branch 'for-linus' of git://git.infradead.org/users/eparis/notify
Linus Torvalds [Thu, 27 Aug 2009 19:26:02 +0000 (12:26 -0700)]
Merge branch 'for-linus' of git://git.infradead.org/users/eparis/notify

* 'for-linus' of git://git.infradead.org/users/eparis/notify:
  inotify: Ensure we alwasy write the terminating NULL.
  inotify: fix locking around inotify watching in the idr
  inotify: do not BUG on idr entries at inotify destruction
  inotify: seperate new watch creation updating existing watches

8 years agolmb: Remove __init from lmb_end_of_DRAM()
Benjamin Herrenschmidt [Thu, 27 Aug 2009 07:20:30 +0000 (17:20 +1000)]
lmb: Remove __init from lmb_end_of_DRAM()

We call lmb_end_of_DRAM() to test whether a DMA mask is ok on a machine
without IOMMU, but this function is marked as __init.

I don't think there's a clean way to get the top of RAM max_pfn doesn't
appear to include highmem or I missed (or we have a bug :-) so for now,
let's just avoid having a broken 2.6.31 by making this function
non-__init and we can revisit later.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh...
Linus Torvalds [Thu, 27 Aug 2009 19:24:08 +0000 (12:24 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/ericvh/v9fs

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
  9p: update documentation pointers
  9p: remove unnecessary v9fses->options which duplicates the mount string
  net/9p: insulate the client against an invalid error code sent by a 9p server
  9p: Add missing cast for the error return value in v9fs_get_inode
  9p: Remove redundant inode uid/gid assignment
  9p: Fix possible regressions when ->get_sb fails.
  9p: Fix v9fs show_options
  9p: Fix possible memleak in v9fs_inode_from fid.
  9p: minor comment fixes
  9p: Fix possible inode leak in v9fs_get_inode.
  9p: Check for error in return value of v9fs_fid_add

8 years agoipv4: make ip_append_data() handle NULL routing table
Julien TINNES [Thu, 27 Aug 2009 13:26:58 +0000 (15:26 +0200)]
ipv4: make ip_append_data() handle NULL routing table

Add a check in ip_append_data() for NULL *rtp to prevent future bugs in
callers from being exploitable.

Signed-off-by: Julien Tinnes <julien@cr0.org>
Signed-off-by: Tavis Ormandy <taviso@sdf.lonestar.org>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoAFS: Stop readlink() on AFS crashing due to NULL 'file' ptr
David Howells [Thu, 27 Aug 2009 12:09:06 +0000 (13:09 +0100)]
AFS: Stop readlink() on AFS crashing due to NULL 'file' ptr

kAFS crashes when asked to read a symbolic link because page_getlink()
passes a NULL file pointer to read_mapping_page(), but afs_readpage()
expects a file pointer from which to extract a key.

Modify afs_readpage() to request the appropriate key from the calling
process's keyrings if a file struct is not supplied with one attached.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoinotify: Ensure we alwasy write the terminating NULL.
Eric W. Biederman [Thu, 27 Aug 2009 10:20:04 +0000 (03:20 -0700)]
inotify: Ensure we alwasy write the terminating NULL.

Before the rewrite copy_event_to_user always wrote a terqminating '\0'
byte to user space after the filename.  Since the rewrite that
terminating byte was skipped if your filename is exactly a multiple of
event_size.  Ouch!

So add one byte to name_size before we round up and use clear_user to
set userspace to zero like /dev/zero does instead of copying the
strange nul_inotify_event.  I can't quite convince myself len_to_zero
will never exceed 16 and even if it doesn't clear_user should be more
efficient and a more accurate reflection of what the code is trying to

Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
8 years agoinotify: fix locking around inotify watching in the idr
Eric Paris [Mon, 24 Aug 2009 20:03:35 +0000 (16:03 -0400)]
inotify: fix locking around inotify watching in the idr

The are races around the idr storage of inotify watches.  It's possible
that a watch could be found from sys_inotify_rm_watch() in the idr, but it
could be removed from the idr before that code does it's removal.  Move the
locking and the refcnt'ing so that these have to happen atomically.

Signed-off-by: Eric Paris <eparis@redhat.com>
8 years agoinotify: do not BUG on idr entries at inotify destruction
Eric Paris [Mon, 24 Aug 2009 20:03:35 +0000 (16:03 -0400)]
inotify: do not BUG on idr entries at inotify destruction

If an inotify watch is left in the idr when an fsnotify group is destroyed
this will lead to a BUG.  This is not a dangerous situation and really
indicates a programming bug and leak of memory.  This patch changes it to
use a WARN and a printk rather than killing people's boxes.

Signed-off-by: Eric Paris <eparis@redhat.com>
8 years agoinotify: seperate new watch creation updating existing watches
Eric Paris [Mon, 24 Aug 2009 20:03:35 +0000 (16:03 -0400)]
inotify: seperate new watch creation updating existing watches

There is nothing known wrong with the inotify watch addition/modification
but this patch seperates the two code paths to make them each easy to
verify as correct.

Signed-off-by: Eric Paris <eparis@redhat.com>
8 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
Linus Torvalds [Thu, 27 Aug 2009 03:54:48 +0000 (20:54 -0700)]
Merge git://git./linux/kernel/git/davem/net-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  virtio: net refill on out-of-memory
  smc91x: fix compilation on SMP

8 years agoMerge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Linus Torvalds [Thu, 27 Aug 2009 03:39:31 +0000 (20:39 -0700)]
Merge branch 'merge' of git://git./linux/kernel/git/benh/powerpc

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  powerpc/ps3: Update ps3_defconfig
  powerpc/ps3: Add missing check for PS3 to rtc-ps3 platform device registration

8 years agopowerpc/ps3: Update ps3_defconfig
Geoff Levand [Tue, 25 Aug 2009 07:53:35 +0000 (07:53 +0000)]
powerpc/ps3: Update ps3_defconfig

Update ps3_defconfig.

 o Refresh for 2.6.31.
 o Remove MTD support.
 o Add more HID drivers.

Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
8 years agopowerpc/ps3: Add missing check for PS3 to rtc-ps3 platform device registration
Geert Uytterhoeven [Sun, 23 Aug 2009 22:54:32 +0000 (22:54 +0000)]
powerpc/ps3: Add missing check for PS3 to rtc-ps3 platform device registration

On non-PS3, we get:

| kernel BUG at drivers/rtc/rtc-ps3.c:36!

because the rtc-ps3 platform device is registered unconditionally in a kernel
with builtin support for PS3.

Reported-by: Sachin Sant <sachinp@in.ibm.com>
Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Acked-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
8 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris...
Linus Torvalds [Thu, 27 Aug 2009 03:17:07 +0000 (20:17 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jmorris/security-testing-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
  IMA: iint put in ima_counts_get and put

8 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux...
Linus Torvalds [Thu, 27 Aug 2009 03:16:38 +0000 (20:16 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/geert/linux-m68k

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
  m68k,m68knommu: Wire up rt_tgsigqueueinfo and perf_counter_open
  m68k: Fix redefinition of pgprot_noncached
  arch/m68k/include/asm/motorola_pgalloc.h: fix kunmap arg
  m68k: cnt reaches -1, not 0
  m68k: count can reach 51, not 50

8 years agoleds: after setting inverted attribute, we must update the LED
Thadeu Lima de Souza Cascardo [Wed, 26 Aug 2009 21:29:32 +0000 (14:29 -0700)]
leds: after setting inverted attribute, we must update the LED

If we change the inverted attribute to another value, the LED will not be
inverted until we change the GPIO state.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com>
Cc: Samuel R. C. Vale <srcvale@holoscopio.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoleds: fix multiple requests and releases of IRQ for GPIO LED Trigger
Thadeu Lima de Souza Cascardo [Wed, 26 Aug 2009 21:29:31 +0000 (14:29 -0700)]
leds: fix multiple requests and releases of IRQ for GPIO LED Trigger

When setting the same GPIO number, multiple IRQ shared requests will be
done without freing the previous request.  It will also try to free a
failed request or an already freed IRQ if 0 was written to the gpio file.

All these oops and leaks were fixed with the following solution: keep the
previous allocated GPIO (if any) still allocated in case the new request
fails.  The alternative solution would desallocate the previous allocated
GPIO and set gpio as 0.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com>
Signed-off-by: Samuel R. C. Vale <srcvale@holoscopio.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoacpi processor: remove superfluous warning message
Frans Pop [Wed, 26 Aug 2009 21:29:30 +0000 (14:29 -0700)]
acpi processor: remove superfluous warning message

This failure is very common on many platforms.  Handling it in the ACPI
processor driver is enough, and we don't need a warning message unless

Based on a patch from Zhang Rui.

Addresses http://bugzilla.kernel.org/show_bug.cgi?id=13389

Signed-off-by: Frans Pop <elendil@planet.nl>
Acked-by: Zhang Rui <rui.zhang@intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoACPI processor: force throttling state when BIOS returns incorrect value
Frans Pop [Wed, 26 Aug 2009 21:29:29 +0000 (14:29 -0700)]
ACPI processor: force throttling state when BIOS returns incorrect value

If the BIOS reports an invalid throttling state (which seems to be
fairly common after system boot), a reset is done to state T0.
Because of a check in acpi_processor_get_throttling_ptc(), the reset
never actually gets executed, which results in the error reoccurring
on every access of for example /proc/acpi/processor/CPU0/throttling.

Add a 'force' option to acpi_processor_set_throttling() to ensure
the reset really takes effect.

Addresses http://bugzilla.kernel.org/show_bug.cgi?id=13389

This patch, together with the next one, fixes a regression introduced in
2.6.30, listed on the regression list. They have been available for 2.5
months now in bugzilla, but have not been picked up, despite various
reminders and without any reason given.

Google shows that numerous people are hitting this issue. The issue is in
itself relatively minor, but the bug in the code is clear.

The patches have been in all my kernels and today testing has shown that
throttling works correctly with the patches applied when the system
overheats (http://bugzilla.kernel.org/show_bug.cgi?id=13918#c14).

Signed-off-by: Frans Pop <elendil@planet.nl>
Acked-by: Zhang Rui <rui.zhang@intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agowmi: fix kernel panic when stack protection enabled.
Costantino Leandro [Wed, 26 Aug 2009 21:29:28 +0000 (14:29 -0700)]
wmi: fix kernel panic when stack protection enabled.

Kernel panic arise when stack protection is enabled, since strncat will
add a null terminating byte '\0'; So in functions
like this one (wmi_query_block):
        char wc[4]="WC";
strncat(method, block->object_id, 2);
the length of wc should be n+1 (wc[5]) or stack protection
fault will arise. This is not noticeable when stack protection is
disabled,but , isn't good either.

Panic Trace
       .... stack-protector: kernel stack corrupted in : fa7b182c
           [<c04a6c40>] ? panic+0x45/0xd9
   [<c012925d>] ? __stack_chk_fail+0x1c/0x40
   [<fa7b182c>] ? wmi_query_block+0x15a/0x162 [wmi]
   [<fa7b182c>] ? wmi_query_block+0x15a/0x162 [wmi]
   [<fa7e7000>] ? acer_wmi_init+0x00/0x61a [acer_wmi]
   [<fa7e7135>] ? acer_wmi_init+0x135/0x61a [acer_wmi]
   [<c0101159>] ? do_one_initcall+0x50+0x126

Addresses http://bugzilla.kernel.org/show_bug.cgi?id=13514

Signed-off-by: Costantino Leandro <lcostantino@gmail.com>
Signed-off-by: Carlos Corbacho <carlos@strangeworlds.co.uk>
Cc: Len Brown <len.brown@intel.com>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoacpi: don't call acpi_processor_init if acpi is disabled
Yinghai Lu [Wed, 26 Aug 2009 21:29:26 +0000 (14:29 -0700)]
acpi: don't call acpi_processor_init if acpi is disabled

Jens reported early_ioremap messages with old ASUS board...

> [    1.507461] pci 0000:00:09.0: Firmware left e100 interrupts enabled; disabling
> [    1.532778] early_ioremap(3fffd0800000005c) [0] => Pid: 1, comm: swapper Not tainted 2.6.31-rc4 #36
> [    1.561007] Call Trace:
> [    1.568638]  [<c136e48b>] ? printk+0x18/0x1d
> [    1.581734]  [<c15513ff>] __early_ioremap+0x74/0x1e9
> [    1.596898]  [<c15515aa>] early_ioremap+0x1a/0x1c
> [    1.611270]  [<c154a187>] __acpi_map_table+0x18/0x1a
> [    1.626451]  [<c135a7f8>] acpi_os_map_memory+0x1d/0x25
> [    1.642129]  [<c119459c>] acpi_tb_verify_table+0x20/0x49
> [    1.658321]  [<c1193e50>] acpi_get_table_with_size+0x53/0xa1
> [    1.675553]  [<c1193eae>] acpi_get_table+0x10/0x15
> [    1.690192]  [<c155cc19>] acpi_processor_init+0x23/0xab
> [    1.706126]  [<c1001043>] do_one_initcall+0x33/0x180
> [    1.721279]  [<c155cbf6>] ? acpi_processor_init+0x0/0xab
> [    1.737479]  [<c106893a>] ? register_irq_proc+0xaa/0xc0
> [    1.753411]  [<c10689b7>] ? init_irq_proc+0x67/0x80
> [    1.768316]  [<c15405e7>] kernel_init+0x120/0x176
> [    1.782678]  [<c15404c7>] ? kernel_init+0x0/0x176
> [    1.797062]  [<c10038b7>] kernel_thread_helper+0x7/0x10
> [    1.812984] 00000080 + ffe00000

that is rather later.
acpi_gbl_permanent_mmap should be set in acpi_early_init()
if acpi is not disabled

and we have
> [    0.000000] ASUS P2B-DS detected: force use of acpi=ht

just don't load acpi_processor_init...

Reported-and-tested-by: Jens Rosenboom <jens@leia.mcbone.net>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agothermal_sys: check get_temp return value
Michael Brunner [Wed, 26 Aug 2009 21:29:25 +0000 (14:29 -0700)]
thermal_sys: check get_temp return value

The return value of the get_temp function is not checked when doing a
thermal zone update.  This may lead to a critical shutdown if get_temp
fails and the content of the temp variable is incorrectly set higher than
the critical trip point.

This has been observed on a system with incorrect ACPI implementation
where the corresponding methods were not serialized and therefore
sometimes triggered ACPI errors (AE_ALREADY_EXISTS).  The following
critical shutdowns indicated a temperature of 2097 C, which was obviously

The patch adds a return value check that jumps over all trip point
evaluations printing a warning if get_temp fails.  The trip points are
evaluated again on the next polling interval with successful get_temp

Signed-off-by: Michael Brunner <mibru@gmx.de>
Acked-by: Zhang Rui <rui.zhang@intel.com>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoclone(): fix race between copy_process() and de_thread()
Oleg Nesterov [Wed, 26 Aug 2009 21:29:24 +0000 (14:29 -0700)]
clone(): fix race between copy_process() and de_thread()

Spotted by Hiroshi Shimamoto who also provided the test-case below.

copy_process() uses signal->count as a reference counter, but it is not.
This test case

#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <stdio.h>
#include <errno.h>
#include <pthread.h>

void *null_thread(void *p)
for (;;)

return NULL;

void *exec_thread(void *p)
execl("/bin/true", "/bin/true", NULL);

return null_thread(p);

int main(int argc, char **argv)
for (;;) {
pid_t pid;
int ret, status;

pid = fork();
if (pid < 0)

if (!pid) {
pthread_t tid;

pthread_create(&tid, NULL, exec_thread, NULL);
for (;;)
pthread_create(&tid, NULL, null_thread, NULL);

do {
ret = waitpid(pid, &status, 0);
} while (ret == -1 && errno == EINTR);

return 0;

quickly creates an unkillable task.

If copy_process(CLONE_THREAD) races with de_thread()
copy_signal()->atomic(signal->count) breaks the signal->notify_count
logic, and the execing thread can hang forever in kernel space.

Change copy_process() to increment count/live only when we know for sure
we can't fail.  In this case the forked thread will take care of its
reference to signal correctly.

If copy_process() fails, check CLONE_THREAD flag.  If it it set - do
nothing, the counters were not changed and current belongs to the same
thread group.  If it is not set, ->signal must be released in any case
(and ->count must be == 1), the forked child is the only thread in the
thread group.

We need more cleanups here, in particular signal->count should not be used
by de_thread/__exit_signal at all.  This patch only fixes the bug.

Reported-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Tested-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agomm: fix for infinite churning of mlocked pages
Minchan Kim [Wed, 26 Aug 2009 21:29:23 +0000 (14:29 -0700)]
mm: fix for infinite churning of mlocked pages

An mlocked page might lose the isolatation race.  This causes the page to
clear PG_mlocked while it remains in a VM_LOCKED vma.  This means it can
be put onto the [in]active list.  We can rescue it by using try_to_unmap()
in shrink_page_list().

But now, As Wu Fengguang pointed out, vmscan has a bug.  If the page has
PG_referenced, it can't reach try_to_unmap() in shrink_page_list() but is
put into the active list.  If the page is referenced repeatedly, it can
remain on the [in]active list without being moving to the unevictable

This patch fixes it.

Reported-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: KOSAKI Motohiro <<kosaki.motohiro@jp.fujitsu.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoflex_array: convert element_nr formals to unsigned
David Rientjes [Wed, 26 Aug 2009 21:29:22 +0000 (14:29 -0700)]
flex_array: convert element_nr formals to unsigned

It's problematic to allow signed element_nr's or total's to be passed as
part of the flex array API.

flex_array_alloc() allows total_nr_elements to be set to a negative
quantity, which is obviously erroneous.

flex_array_get() and flex_array_put() allows negative array indices in
dereferencing an array part, which could address memory mapped before
struct flex_array.

The fix is to convert all existing element_nr formals to be qualified as
unsigned.  Existing checks to compare it to total_nr_elements or the max
array size based on element_size need not be changed.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoflex_array: declare parts member to have incomplete type
David Rientjes [Wed, 26 Aug 2009 21:29:21 +0000 (14:29 -0700)]
flex_array: declare parts member to have incomplete type

The `parts' member of struct flex_array should evaluate to an incomplete
type so that sizeof() cannot be used and C99 does not require the
zero-length specification.

Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoflex_array: fix flex_array_free_parts comment
David Rientjes [Wed, 26 Aug 2009 21:29:20 +0000 (14:29 -0700)]
flex_array: fix flex_array_free_parts comment

flex_array_free_parts() does not take `src' or `element_nr' formals, so
remove their respective comments.

Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoflex_array: fix get function for elements in base starting at non-zero
David Rientjes [Wed, 26 Aug 2009 21:29:20 +0000 (14:29 -0700)]
flex_array: fix get function for elements in base starting at non-zero

If all array elements fit into the base structure and data is copied using
flex_array_put() starting at a non-zero index, flex_array_get() will fail
to return the data.

This fixes the bug by only checking for NULL parts when all elements do
not fit in the base structure when flex_array_get() is used.  Otherwise,
fa_element_to_part_nr() will always be 0 since there are no parts
structures needed and such element may never have been put.  Thus, it will
remain NULL due to the kzalloc() of the base.

Additionally, flex_array_put() now only checks for a NULL part when all
elements do not fit in the base structure.  This is otherwise unnecessary
since the base structure is guaranteed to exist (or we would have already
hit a NULL pointer).

Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agopps: fix incorrect verdict check
Joonwoo Park [Wed, 26 Aug 2009 21:29:18 +0000 (14:29 -0700)]
pps: fix incorrect verdict check

Fix incorrect verdict check and returns error if device_create failed,
otherwise driver triggers kernel oops.

Signed-off-by: Joonwoo Park<joonwpark81@gmail.com>
Cc: Rodolfo Giometti <giometti@enneenne.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>