commit 6a0a453177b4ed9de73a93af7f15473389b3248b Author: Greg Kroah-Hartman Date: Mon Jul 28 08:07:25 2014 -0700 Linux 3.14.14 commit 0163da2fc54be782c43476ebd68ced8a312e7e68 Author: Anton Kolesov Date: Fri Jun 20 20:28:39 2014 +0400 ARC: Implement ptrace(PTRACE_GET_THREAD_AREA) commit a4b6cb735b25aa84a462a1985e3e43bebaf5beb4 upstream. This patch adds implementation of GET_THREAD_AREA ptrace request type. This is required by GDB to debug NPTL applications. Signed-off-by: Anton Kolesov Signed-off-by: Vineet Gupta Signed-off-by: Greg Kroah-Hartman commit 15c37d7780a8a807d778c57ca647beb2c8df4e71 Author: Linus Torvalds Date: Sun Jun 8 14:17:00 2014 -0700 Don't trigger congestion wait on dirty-but-not-writeout pages commit b738d764652dc5aab1c8939f637112981fce9e0e upstream. shrink_inactive_list() used to wait 0.1s to avoid congestion when all the pages that were isolated from the inactive list were dirty but not under active writeback. That makes no real sense, and apparently causes major interactivity issues under some loads since 3.11. The ostensible reason for it was to wait for kswapd to start writing pages, but that seems questionable as well, since the congestion wait code seems to trigger for kswapd itself as well. Also, the logic behind delaying anything when we haven't actually started writeback is not clear - it only delays actually starting that writeback. We'll still trigger the congestion waiting if (a) the process is kswapd, and we hit pages flagged for immediate reclaim (b) the process is not kswapd, and the zone backing dev writeback is actually congested. This probably needs to be revisited, but as it is this fixes a reported regression. Reported-by: Felipe Contreras Pinpointed-by: Hillf Danton Cc: Michal Hocko Cc: Andrew Morton Cc: Mel Gorman Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman [mhocko@suse.cz: backport to 3.12 stable tree] Fixes: e2be15f6c3ee ('mm: vmscan: stall page reclaim and writeback pages based on dirty/writepage pages encountered') Reported-by: Felipe Contreras Pinpointed-by: Hillf Danton Cc: Michal Hocko Cc: Andrew Morton Cc: Mel Gorman Signed-off-by: Linus Torvalds Signed-off-by: Michal Hocko Signed-off-by: Greg Kroah-Hartman commit 3e34e4d85a3918cc4e22e160ee843d28d1954103 Author: Emmanuel Grumbach Date: Thu Jul 3 20:46:35 2014 +0300 iwlwifi: mvm: disable CTS to Self commit dc271ee0d04d12d6bfabacbec803289a7072fbd9 upstream. Firmware folks seem say that this flag can make trouble. Drop it. The advantage of CTS to self is that it slightly reduces the cost of the protection, but make the protection less reliable. Signed-off-by: Emmanuel Grumbach Signed-off-by: Greg Kroah-Hartman commit a8eec75aa5f7c47a1f00e10e97ecafb09442b191 Author: Marek Vasut Date: Fri Feb 28 12:58:41 2014 +0100 ARM: dts: imx: Add alias for ethernet controller commit 22970070e027cbbb9b2878f8f7c31d0d7f29e94d upstream. Add alias for FEC ethernet on i.MX to allow bootloaders (like U-Boot) patch-in the MAC address for FEC using this alias. Signed-off-by: Marek Vasut Signed-off-by: Shawn Guo Signed-off-by: Greg Kroah-Hartman commit 8d378bce5487a81db606c4c8b01305cb4b0fcc97 Author: Benjamin LaHaise Date: Mon Jul 14 12:49:26 2014 -0400 aio: protect reqs_available updates from changes in interrupt handlers commit 263782c1c95bbddbb022dc092fd89a36bb8d5577 upstream. As of commit f8567a3845ac05bb28f3c1b478ef752762bd39ef it is now possible to have put_reqs_available() called from irq context. While put_reqs_available() is per cpu, it did not protect itself from interrupts on the same CPU. This lead to aio_complete() corrupting the available io requests count when run under a heavy O_DIRECT workloads as reported by Robert Elliott. Fix this by disabling irq updates around the per cpu batch updates of reqs_available. Many thanks to Robert and folks for testing and tracking this down. Reported-by: Robert Elliot Tested-by: Robert Elliot Signed-off-by: Benjamin LaHaise Cc: Jens Axboe , Christoph Hellwig Signed-off-by: Greg Kroah-Hartman commit 2ef68c813d46c975f119f4a8d434dee56677f724 Author: Mateusz Guzik Date: Sat Jun 14 15:00:09 2014 +0200 sched: Fix possible divide by zero in avg_atom() calculation commit b0ab99e7736af88b8ac1b7ae50ea287fffa2badc upstream. proc_sched_show_task() does: if (nr_switches) do_div(avg_atom, nr_switches); nr_switches is unsigned long and do_div truncates it to 32 bits, which means it can test non-zero on e.g. x86-64 and be truncated to zero for division. Fix the problem by using div64_ul() instead. As a side effect calculations of avg_atom for big nr_switches are now correct. Signed-off-by: Mateusz Guzik Signed-off-by: Peter Zijlstra Cc: Linus Torvalds Link: http://lkml.kernel.org/r/1402750809-31991-1-git-send-email-mguzik@redhat.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit 2e3ca7f6ea58cda872675259cdf9a566992bb6cf Author: Peter Zijlstra Date: Fri Jun 6 19:53:16 2014 +0200 locking/mutex: Disable optimistic spinning on some architectures commit 4badad352a6bb202ec68afa7a574c0bb961e5ebc upstream. The optimistic spin code assumes regular stores and cmpxchg() play nice; this is found to not be true for at least: parisc, sparc32, tile32, metag-lock1, arc-!llsc and hexagon. There is further wreckage, but this in particular seemed easy to trigger, so blacklist this. Opt in for known good archs. Signed-off-by: Peter Zijlstra Reported-by: Mikulas Patocka Cc: David Miller Cc: Chris Metcalf Cc: James Bottomley Cc: Vineet Gupta Cc: Jason Low Cc: Waiman Long Cc: "James E.J. Bottomley" Cc: Paul McKenney Cc: John David Anglin Cc: James Hogan Cc: Linus Torvalds Cc: Davidlohr Bueso Cc: Benjamin Herrenschmidt Cc: Catalin Marinas Cc: Russell King Cc: Will Deacon Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: sparclinux@vger.kernel.org Link: http://lkml.kernel.org/r/20140606175316.GV13930@laptop.programming.kicks-ass.net Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit 9abefe6933601246129da98d6c16936547a5cc8e Author: Takashi Iwai Date: Tue Jul 15 08:51:27 2014 +0200 PM / sleep: Fix request_firmware() error at resume commit 4320f6b1d9db4ca912c5eb6ecb328b2e090e1586 upstream. The commit [247bc037: PM / Sleep: Mitigate race between the freezer and request_firmware()] introduced the finer state control, but it also leads to a new bug; for example, a bug report regarding the firmware loading of intel BT device at suspend/resume: https://bugzilla.novell.com/show_bug.cgi?id=873790 The root cause seems to be a small window between the process resume and the clear of usermodehelper lock. The request_firmware() function checks the UMH lock and gives up when it's in UMH_DISABLE state. This is for avoiding the invalid f/w loading during suspend/resume phase. The problem is, however, that usermodehelper_enable() is called at the end of thaw_processes(). Thus, a thawed process in between can kick off the f/w loader code path (in this case, via btusb_setup_intel()) even before the call of usermodehelper_enable(). Then usermodehelper_read_trylock() returns an error and request_firmware() spews WARN_ON() in the end. This oneliner patch fixes the issue just by setting to UMH_FREEZING state again before restarting tasks, so that the call of request_firmware() will be blocked until the end of this function instead of returning an error. Fixes: 247bc0374254 (PM / Sleep: Mitigate race between the freezer and request_firmware()) Link: https://bugzilla.novell.com/show_bug.cgi?id=873790 Signed-off-by: Takashi Iwai Signed-off-by: Rafael J. Wysocki Signed-off-by: Greg Kroah-Hartman commit 5b7d98d5e2f7496ca2254b6d628f1158ada1e7c2 Author: Mike Snitzer Date: Mon Jul 14 16:59:39 2014 -0400 dm cache metadata: do not allow the data block size to change commit 048e5a07f282c57815b3901d4a68a77fa131ce0a upstream. The block size for the dm-cache's data device must remained fixed for the life of the cache. Disallow any attempt to change the cache's data block size. Signed-off-by: Mike Snitzer Acked-by: Joe Thornber Signed-off-by: Greg Kroah-Hartman commit 60d97c8d656f4c37c0725081adc3cbe356d8bee5 Author: Mike Snitzer Date: Mon Jul 14 16:35:54 2014 -0400 dm thin metadata: do not allow the data block size to change commit 9aec8629ec829fc9403788cd959e05dd87988bd1 upstream. The block size for the thin-pool's data device must remained fixed for the life of the thin-pool. Disallow any attempt to change the thin-pool's data block size. It should be noted that attempting to change the data block size via thin-pool table reload will be ignored as a side-effect of the thin-pool handover that the thin-pool target does during thin-pool table reload. Here is an example outcome of attempting to load a thin-pool table that reduced the thin-pool's data block size from 1024K to 512K. Before: kernel: device-mapper: thin: 253:4: growing the data device from 204800 to 409600 blocks After: kernel: device-mapper: thin metadata: changing the data block size (from 2048 to 1024) is not supported kernel: device-mapper: table: 253:4: thin-pool: Error creating metadata object kernel: device-mapper: ioctl: error adding target to table Signed-off-by: Mike Snitzer Acked-by: Joe Thornber Signed-off-by: Greg Kroah-Hartman commit f4a407b32de430e751a24e8d579af36ac74d99b3 Author: Ted Juan Date: Fri Jun 20 17:32:05 2014 +0800 mtd: devices: elm: fix elm_context_save() and elm_context_restore() functions commit 6938ad40cb97a52d88a763008935340729a4acc7 upstream. These two function's switch case lack the 'break' that make them always return error. Signed-off-by: Ted Juan Acked-by: Pekon Gupta Signed-off-by: Brian Norris Signed-off-by: Greg Kroah-Hartman commit 89d30c2107c85ce3ae1a267f1065aa0fb1e5b310 Author: Peter Zijlstra Date: Tue Jun 24 14:48:19 2014 +0200 x86, tsc: Fix cpufreq lockup commit 3896c329df8092661dac80f55a8c3f60136fd61a upstream. Mauro reported that his AMD X2 using the powernow-k8 cpufreq driver locked up when doing cpu hotplug. Because we called set_cyc2ns_scale() from the time_cpufreq_notifier() unconditionally, it gets called multiple times for each freq change, instead of only the once, when the tsc_khz value actually changes. Because it gets called more than once, we run out of cyc2ns data slots and stall, waiting for a free one, but because we're half way offline, there's no consumers to free slots. By placing the call inside the condition that actually changes tsc_khz we avoid superfluous calls and avoid the problem. Reported-by: Mauro Tested-by: Mauro Fixes: 20d1c86a5776 ("sched/clock, x86: Rewrite cyc2ns() to avoid the need to disable IRQs") Cc: "Rafael J. Wysocki" Cc: Viresh Kumar Cc: Bin Gao Cc: Linus Torvalds Cc: Mika Westerberg Cc: Paul Gortmaker Cc: Stefani Seibold Cc: linux-kernel@vger.kernel.org Signed-off-by: Peter Zijlstra Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit 025d494281c58dcee04fe3ee2b7fa300e150c95c Author: John Stultz Date: Mon Jul 7 14:06:11 2014 -0700 alarmtimer: Fix bug where relative alarm timers were treated as absolute commit 16927776ae757d0d132bdbfabbfe2c498342bd59 upstream. Sharvil noticed with the posix timer_settime interface, using the CLOCK_REALTIME_ALARM or CLOCK_BOOTTIME_ALARM clockid, if the users tried to specify a relative time timer, it would incorrectly be treated as absolute regardless of the state of the flags argument. This patch corrects this, properly checking the absolute/relative flag, as well as adds further error checking that no invalid flag bits are set. Reported-by: Sharvil Nanavati Signed-off-by: John Stultz Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Prarit Bhargava Cc: Sharvil Nanavati Link: http://lkml.kernel.org/r/1404767171-6902-1-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman commit a0c34f54dacf05e07de8d8d438a4e5848183da71 Author: Alex Deucher Date: Mon Jul 14 17:57:19 2014 -0400 drm/radeon: avoid leaking edid data commit 0ac66effe7fcdee55bda6d5d10d3372c95a41920 upstream. In some cases we fetch the edid in the detect() callback in order to determine what sort of monitor is connected. If that happens, don't fetch the edid again in the get_modes() callback or we will leak the edid. Signed-off-by: Alex Deucher Signed-off-by: Greg Kroah-Hartman commit e96faee7f217d9998e6602bc331780654fbdc023 Author: Jason Wang Date: Mon May 12 16:35:39 2014 +0800 drm/qxl: return IRQ_NONE if it was not our irq commit fbb60fe35ad579b511de8604b06a30b43846473b upstream. Return IRQ_NONE if it was not our irq. This is necessary for the case when qxl is sharing irq line with a device A in a crash kernel. If qxl is initialized before A and A's irq was raised during this gap, returning IRQ_HANDLED in this case will cause this irq to be raised again after EOI since kernel think it was handled but in fact it was not. Cc: Gerd Hoffmann Signed-off-by: Jason Wang Signed-off-by: Dave Airlie Signed-off-by: Greg Kroah-Hartman commit 345e666b2c8c8c78fbb5016f1807ef83279bece5 Author: Alex Deucher Date: Tue Jul 15 09:48:53 2014 -0400 drm/radeon: set default bl level to something reasonable commit 201bb62402e0227375c655446ea04fcd0acf7287 upstream. If the value in the scratch register is 0, set it to the max level. This fixes an issue where the console fb blanking code calls back into the backlight driver on unblank and then sets the backlight level to 0 after the driver has already set the mode and enabled the backlight. bugs: https://bugs.freedesktop.org/show_bug.cgi?id=81382 https://bugs.freedesktop.org/show_bug.cgi?id=70207 Signed-off-by: Alex Deucher Tested-by: David Heidelberger Signed-off-by: Greg Kroah-Hartman commit 59c04db2b06f70b883ff161586f915a2243887ef Author: Tomasz Figa Date: Thu Jul 17 17:23:44 2014 +0200 irqchip: gic: Fix core ID calculation when topology is read from DT commit 29e697b11853d3f83b1864ae385abdad4aa2c361 upstream. Certain GIC implementation, namely those found on earlier, single cluster, Exynos SoCs, have registers mapped without per-CPU banking, which means that the driver needs to use different offset for each CPU. Currently the driver calculates the offset by multiplying value returned by cpu_logical_map() by CPU offset parsed from DT. This is correct when CPU topology is not specified in DT and aforementioned function returns core ID alone. However when DT contains CPU topology, the function changes to return cluster ID as well, which is non-zero on mentioned SoCs and so breaks the calculation in GIC driver. This patch fixes this by masking out cluster ID in CPU offset calculation so that only core ID is considered. Multi-cluster Exynos SoCs already have banked GIC implementations, so this simple fix should be enough. Reported-by: Lorenzo Pieralisi Reported-by: Bartlomiej Zolnierkiewicz Signed-off-by: Tomasz Figa Fixes: db0d4db22a78d ("ARM: gic: allow GIC to support non-banked setups") Link: https://lkml.kernel.org/r/1405610624-18722-1-git-send-email-t.figa@samsung.com Signed-off-by: Jason Cooper Signed-off-by: Greg Kroah-Hartman commit ddd78c85d594a0d4223e87cb5152d21c98732ae1 Author: Suravee Suthikulpanit Date: Tue Jul 15 00:03:03 2014 +0200 irqchip: gic: Add binding probe for ARM GIC400 commit 144cb08864ed44be52d8634ac69cd98e5efcf527 upstream. Commit 3ab72f9156bb "dt-bindings: add GIC-400 binding" added the "arm,gic-400" compatible string, but the corresponding IRQCHIP_DECLARE was never added to the gic driver. Therefore add the missing irqchip declaration for it. Signed-off-by: Suravee Suthikulpanit Signed-off-by: Greg Kroah-Hartman Removed additional empty line and adapted commit message to mark it as fixing an issue. Signed-off-by: Heiko Stuebner Acked-by: Will Deacon Fixes: 3ab72f9156bb ("dt-bindings: add GIC-400 binding") Link: https://lkml.kernel.org/r/2621565.f5eISveXXJ@diego Signed-off-by: Jason Cooper commit 2d2f8e6fbc209e8797572f00a3f7142d739ac170 Author: Matthias Brugger Date: Thu Jul 3 13:58:52 2014 +0200 irqchip: gic: Add support for cortex a7 compatible string commit a97e8027b1d28eafe6bafe062556c1ec926a49c6 upstream. Patch 0a68214b "ARM: DT: Add binding for GIC virtualization extentions (VGIC)" added the "arm,cortex-a7-gic" compatible string, but the corresponding IRQCHIP_DECLARE was never added to the gic driver. To let real Cortex-A7 SoCs use it, add the necessary declaration to the device driver. Signed-off-by: Matthias Brugger Link: https://lkml.kernel.org/r/1404388732-28890-1-git-send-email-matthias.bgg@gmail.com Fixes: 0a68214b76ca ("ARM: DT: Add binding for GIC virtualization extentions (VGIC)") Signed-off-by: Jason Cooper Signed-off-by: Greg Kroah-Hartman commit d9d98e024c1cac5b04c5a4a3460889591f138d7a Author: Martin Lau Date: Mon Jun 9 23:06:42 2014 -0700 ring-buffer: Fix polling on trace_pipe commit 97b8ee845393701edc06e27ccec2876ff9596019 upstream. ring_buffer_poll_wait() should always put the poll_table to its wait_queue even there is immediate data available. Otherwise, the following epoll and read sequence will eventually hang forever: 1. Put some data to make the trace_pipe ring_buffer read ready first 2. epoll_ctl(efd, EPOLL_CTL_ADD, trace_pipe_fd, ee) 3. epoll_wait() 4. read(trace_pipe_fd) till EAGAIN 5. Add some more data to the trace_pipe ring_buffer 6. epoll_wait() -> this epoll_wait() will block forever ~ During the epoll_ctl(efd, EPOLL_CTL_ADD,...) call in step 2, ring_buffer_poll_wait() returns immediately without adding poll_table, which has poll_table->_qproc pointing to ep_poll_callback(), to its wait_queue. ~ During the epoll_wait() call in step 3 and step 6, ring_buffer_poll_wait() cannot add ep_poll_callback() to its wait_queue because the poll_table->_qproc is NULL and it is how epoll works. ~ When there is new data available in step 6, ring_buffer does not know it has to call ep_poll_callback() because it is not in its wait queue. Hence, block forever. Other poll implementation seems to call poll_wait() unconditionally as the very first thing to do. For example, tcp_poll() in tcp.c. Link: http://lkml.kernel.org/p/20140610060637.GA14045@devbig242.prn2.facebook.com Fixes: 2a2cc8f7c4d0 "ftrace: allow the event pipe to be polled" Reviewed-by: Chris Mason Signed-off-by: Martin Lau Signed-off-by: Steven Rostedt Signed-off-by: Greg Kroah-Hartman commit eb1141c7d950e2f13638587d43b6cc50764b3a80 Author: Amitkumar Karwar Date: Fri Jun 20 11:45:25 2014 -0700 mwifiex: fix Tx timeout issue commit d76744a93246eccdca1106037e8ee29debf48277 upstream. https://bugzilla.kernel.org/show_bug.cgi?id=70191 https://bugzilla.kernel.org/show_bug.cgi?id=77581 It is observed that sometimes Tx packet is downloaded without adding driver's txpd header. This results in firmware parsing garbage data as packet length. Sometimes firmware is unable to read the packet if length comes out as invalid. This stops further traffic and timeout occurs. The root cause is uninitialized fields in tx_info(skb->cb) of packet used to get garbage values. In this case if MWIFIEX_BUF_FLAG_REQUEUED_PKT flag is mistakenly set, txpd header was skipped. This patch makes sure that tx_info is correctly initialized to fix the problem. Reported-by: Andrew Wiley Reported-by: Linus Gasser Reported-by: Michael Hirsch Tested-by: Xinming Hu Signed-off-by: Amitkumar Karwar Signed-off-by: Maithili Hinge Signed-off-by: Avinash Patil Signed-off-by: Bing Zhao Signed-off-by: John W. Linville Signed-off-by: Greg Kroah-Hartman commit 6ed7d8044bc7ac58a25dfb2a7220aa18874e9437 Author: HATAYAMA Daisuke Date: Wed Jun 25 10:09:07 2014 +0900 perf/x86/intel: ignore CondChgd bit to avoid false NMI handling commit b292d7a10487aee6e74b1c18b8d95b92f40d4a4f upstream. Currently, any NMI is falsely handled by a NMI handler of NMI watchdog if CondChgd bit in MSR_CORE_PERF_GLOBAL_STATUS MSR is set. For example, we use external NMI to make system panic to get crash dump, but in this case, the external NMI is falsely handled do to the issue. This commit deals with the issue simply by ignoring CondChgd bit. Here is explanation in detail. On x86 NMI watchdog uses performance monitoring feature to periodically signal NMI each time performance counter gets overflowed. intel_pmu_handle_irq() is called as a NMI_LOCAL handler from a NMI handler of NMI watchdog, perf_event_nmi_handler(). It identifies an owner of a given NMI by looking at overflow status bits in MSR_CORE_PERF_GLOBAL_STATUS MSR. If some of the bits are set, then it handles the given NMI as its own NMI. The problem is that the intel_pmu_handle_irq() doesn't distinguish CondChgd bit from other bits. Unlike the other status bits, CondChgd bit doesn't represent overflow status for performance counters. Thus, CondChgd bit cannot be thought of as a mark indicating a given NMI is NMI watchdog's. As a result, if CondChgd bit is set, any NMI is falsely handled by the NMI handler of NMI watchdog. Also, if type of the falsely handled NMI is either NMI_UNKNOWN, NMI_SERR or NMI_IO_CHECK, the corresponding action is never performed until CondChgd bit is cleared. I noticed this behavior on systems with Ivy Bridge processors: Intel Xeon CPU E5-2630 v2 and Intel Xeon CPU E7-8890 v2. On both systems, CondChgd bit in MSR_CORE_PERF_GLOBAL_STATUS MSR has already been set in the beginning at boot. Then the CondChgd bit is immediately cleared by next wrmsr to MSR_CORE_PERF_GLOBAL_CTRL MSR and appears to remain 0. On the other hand, on older processors such as Nehalem, Xeon E7540, CondChgd bit is not set in the beginning at boot. I'm not sure about exact behavior of CondChgd bit, in particular when this bit is set. Although I read Intel System Programmer's Manual to figure out that, the descriptions I found are: In 18.9.1: "The MSR_PERF_GLOBAL_STATUS MSR also provides a ¡sticky bit¢ to indicate changes to the state of performancmonitoring hardware" In Table 35-2 IA-32 Architectural MSRs 63 CondChg: status bits of this register has changed. These are different from the bahviour I see on the actual system as I explained above. At least, I think ignoring CondChgd bit should be enough for NMI watchdog perspective. Signed-off-by: HATAYAMA Daisuke Acked-by: Don Zickus Signed-off-by: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Linus Torvalds Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20140625.103503.409316067.d.hatayama@jp.fujitsu.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit a874a875b978fc05d0e58a514e714c1719a0bfb3 Author: Jiri Olsa Date: Tue Jun 24 10:20:25 2014 +0200 perf: Do not allow optimized switch for non-cloned events commit 1f9a7268c67f0290837aada443d28fd953ddca90 upstream. The context check in perf_event_context_sched_out allows non-cloned context to be part of the optimized schedule out switch. This could move non-cloned context into another workload child. Once this child exits, the context is closed and leaves all original (parent) events in closed state. Any other new cloned event will have closed state and not measure anything. And probably causing other odd bugs. Signed-off-by: Jiri Olsa Signed-off-by: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Paul Mackerras Cc: Frederic Weisbecker Cc: Namhyung Kim Cc: Paul Mackerras Cc: Corey Ashford Cc: David Ahern Cc: Jiri Olsa Cc: Linus Torvalds Link: http://lkml.kernel.org/r/1403598026-2310-2-git-send-email-jolsa@kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit 93baec32588c63bcef26e8f06a1b1e081d6d1752 Author: Eric Dumazet Date: Mon Jul 21 07:17:42 2014 +0200 ipv4: fix buffer overflow in ip_options_compile() [ Upstream commit 10ec9472f05b45c94db3c854d22581a20b97db41 ] There is a benign buffer overflow in ip_options_compile spotted by AddressSanitizer[1] : Its benign because we always can access one extra byte in skb->head (because header is followed by struct skb_shared_info), and in this case this byte is not even used. [28504.910798] ================================================================== [28504.912046] AddressSanitizer: heap-buffer-overflow in ip_options_compile [28504.913170] Read of size 1 by thread T15843: [28504.914026] [] ip_options_compile+0x121/0x9c0 [28504.915394] [] ip_options_get_from_user+0xad/0x120 [28504.916843] [] do_ip_setsockopt.isra.15+0x8df/0x1630 [28504.918175] [] ip_setsockopt+0x30/0xa0 [28504.919490] [] tcp_setsockopt+0x5b/0x90 [28504.920835] [] sock_common_setsockopt+0x5f/0x70 [28504.922208] [] SyS_setsockopt+0xa2/0x140 [28504.923459] [] system_call_fastpath+0x16/0x1b [28504.924722] [28504.925106] Allocated by thread T15843: [28504.925815] [] ip_options_get_from_user+0x35/0x120 [28504.926884] [] do_ip_setsockopt.isra.15+0x8df/0x1630 [28504.927975] [] ip_setsockopt+0x30/0xa0 [28504.929175] [] tcp_setsockopt+0x5b/0x90 [28504.930400] [] sock_common_setsockopt+0x5f/0x70 [28504.931677] [] SyS_setsockopt+0xa2/0x140 [28504.932851] [] system_call_fastpath+0x16/0x1b [28504.934018] [28504.934377] The buggy address ffff880026382828 is located 0 bytes to the right [28504.934377] of 40-byte region [ffff880026382800, ffff880026382828) [28504.937144] [28504.937474] Memory state around the buggy address: [28504.938430] ffff880026382300: ........ rrrrrrrr rrrrrrrr rrrrrrrr [28504.939884] ffff880026382400: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28504.941294] ffff880026382500: .....rrr rrrrrrrr rrrrrrrr rrrrrrrr [28504.942504] ffff880026382600: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28504.943483] ffff880026382700: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28504.944511] >ffff880026382800: .....rrr rrrrrrrr rrrrrrrr rrrrrrrr [28504.945573] ^ [28504.946277] ffff880026382900: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28505.094949] ffff880026382a00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28505.096114] ffff880026382b00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28505.097116] ffff880026382c00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28505.098472] ffff880026382d00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28505.099804] Legend: [28505.100269] f - 8 freed bytes [28505.100884] r - 8 redzone bytes [28505.101649] . - 8 allocated bytes [28505.102406] x=1..7 - x allocated bytes + (8-x) redzone bytes [28505.103637] ================================================================== [1] https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 78ed26dcac7230682bc03337602538c5d500313e Author: Ben Hutchings Date: Mon Jul 21 00:06:48 2014 +0100 dns_resolver: Null-terminate the right string [ Upstream commit 640d7efe4c08f06c4ae5d31b79bd8740e7f6790a ] *_result[len] is parsed as *(_result[len]) which is not at all what we want to touch here. Signed-off-by: Ben Hutchings Fixes: 84a7c0b1db1c ("dns_resolver: assure that dns_query() result is null-terminated") Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit b477b4df82ea86127be9f5ab834290bf1d7f17df Author: Manuel Schölling Date: Sat Jun 7 23:57:25 2014 +0200 dns_resolver: assure that dns_query() result is null-terminated [ Upstream commit 84a7c0b1db1c17d5ded8d3800228a608e1070b40 ] dns_query() credulously assumes that keys are null-terminated and returns a copy of a memory block that is off by one. Signed-off-by: Manuel Schölling Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit e8d7581a37f781789e9ba5e50aac5b3bbe4f2ec7 Author: Bjørn Mork Date: Thu Jul 17 13:34:09 2014 +0200 net: huawei_cdc_ncm: add "subclass 3" devices [ Upstream commit c2a6c7813f1ffae636e369b5d7011c9f518d3cd9 ] Huawei's usage of the subclass and protocol fields is not 100% clear to us, but there appears to be a very strict system. A device with the "shared" device ID 12d1:1506 and this NCM function was recently reported (showing only default altsetting): Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 1 bAlternateSetting 0 bNumEndpoints 1 bInterfaceClass 255 Vendor Specific Class bInterfaceSubClass 3 bInterfaceProtocol 22 iInterface 8 CDC Network Control Model (NCM) ** UNRECOGNIZED: 05 24 00 10 01 ** UNRECOGNIZED: 06 24 1a 00 01 1f ** UNRECOGNIZED: 0c 24 1b 00 01 00 04 10 14 dc 05 20 ** UNRECOGNIZED: 0d 24 0f 0a 0f 00 00 00 ea 05 03 00 01 ** UNRECOGNIZED: 05 24 06 01 01 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x85 EP 5 IN bmAttributes 3 Transfer Type Interrupt Synch Type None Usage Type Data wMaxPacketSize 0x0010 1x 16 bytes bInterval 9 Cc: Enrico Mioso Signed-off-by: Bjørn Mork Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 4b4190fa097abdeee96137c52cc65c76a1ca8e48 Author: Sowmini Varadhan Date: Wed Jul 16 10:02:26 2014 -0400 sunvnet: clean up objects created in vnet_new() on vnet_exit() [ Upstream commit a4b70a07ed12a71131cab7adce2ce91c71b37060 ] Nothing cleans up the objects created by vnet_new(), they are completely leaked. vnet_exit(), after doing the vio_unregister_driver() to clean up ports, should call a helper function that iterates over vnet_list and cleans up those objects. This includes unregister_netdevice() as well as free_netdev(). Signed-off-by: Sowmini Varadhan Acked-by: Dave Kleikamp Reviewed-by: Karl Volz Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 9b6cdedfa3cfc62b2edf4d4abbda2263bd062684 Author: Jerry Chu Date: Mon Jul 14 15:54:46 2014 -0700 net-gre-gro: Fix a bug that breaks the forwarding path [ Upstream commit c3caf1192f904de2f1381211f564537235d50de3 ] Fixed a bug that was introduced by my GRE-GRO patch (bf5a755f5e9186406bbf50f4087100af5bd68e40 net-gre-gro: Add GRE support to the GRO stack) that breaks the forwarding path because various GSO related fields were not set. The bug will cause on the egress path either the GSO code to fail, or a GRE-TSO capable (NETIF_F_GSO_GRE) NICs to choke. The following fix has been tested for both cases. Signed-off-by: H.K. Jerry Chu Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit dab45ea356389bfed8dde8c3566151f1d26f34d8 Author: Nikolay Aleksandrov Date: Sun Jul 13 09:47:47 2014 +0200 bonding: fix ad_select module param check [ Upstream commit 548d28bd0eac840d122b691279ce9f4ce6ecbfb6 ] Obvious copy/paste error when I converted the ad_select to the new option API. "lacp_rate" there should be "ad_select" so we can get the proper value. CC: Jay Vosburgh CC: Veaceslav Falico CC: Andy Gospodarek CC: David S. Miller Fixes: 9e5f5eebe765 ("bonding: convert ad_select to use the new option API") Reported-by: Karim Scheik Signed-off-by: Nikolay Aleksandrov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit c271f855d7ad3be668ff221b4b225838aae6f39c Author: Christoph Schulz Date: Sun Jul 13 00:53:15 2014 +0200 net: pppoe: use correct channel MTU when using Multilink PPP [ Upstream commit a8a3e41c67d24eb12f9ab9680cbb85e24fcd9711 ] The PPP channel MTU is used with Multilink PPP when ppp_mp_explode() (see ppp_generic module) tries to determine how big a fragment might be. According to RFC 1661, the MTU excludes the 2-byte PPP protocol field, see the corresponding comment and code in ppp_mp_explode(): /* * hdrlen includes the 2-byte PPP protocol field, but the * MTU counts only the payload excluding the protocol field. * (RFC1661 Section 2) */ mtu = pch->chan->mtu - (hdrlen - 2); However, the pppoe module *does* include the PPP protocol field in the channel MTU, which is wrong as it causes the PPP payload to be 1-2 bytes too big under certain circumstances (one byte if PPP protocol compression is used, two otherwise), causing the generated Ethernet packets to be dropped. So the pppoe module has to subtract two bytes from the channel MTU. This error only manifests itself when using Multilink PPP, as otherwise the channel MTU is not used anywhere. In the following, I will describe how to reproduce this bug. We configure two pppd instances for multilink PPP over two PPPoE links, say eth2 and eth3, with a MTU of 1492 bytes for each link and a MRRU of 2976 bytes. (This MRRU is computed by adding the two link MTUs and subtracting the MP header twice, which is 4 bytes long.) The necessary pppd statements on both sides are "multilink mtu 1492 mru 1492 mrru 2976". On the client side, we additionally need "plugin rp-pppoe.so eth2" and "plugin rp-pppoe.so eth3", respectively; on the server side, we additionally need to start two pppoe-server instances to be able to establish two PPPoE sessions, one over eth2 and one over eth3. We set the MTU of the PPP network interface to the MRRU (2976) on both sides of the connection in order to make use of the higher bandwidth. (If we didn't do that, IP fragmentation would kick in, which we want to avoid.) Now we send a ICMPv4 echo request with a payload of 2948 bytes from client to server over the PPP link. This results in the following network packet: 2948 (echo payload) + 8 (ICMPv4 header) + 20 (IPv4 header) --------------------- 2976 (PPP payload) These 2976 bytes do not exceed the MTU of the PPP network interface, so the IP packet is not fragmented. Now the multilink PPP code in ppp_mp_explode() prepends one protocol byte (0x21 for IPv4), making the packet one byte bigger than the negotiated MRRU. So this packet would have to be divided in three fragments. But this does not happen as each link MTU is assumed to be two bytes larger. So this packet is diveded into two fragments only, one of size 1489 and one of size 1488. Now we have for that bigger fragment: 1489 (PPP payload) + 4 (MP header) + 2 (PPP protocol field for the MP payload (0x3d)) + 6 (PPPoE header) -------------------------- 1501 (Ethernet payload) This packet exceeds the link MTU and is discarded. If one configures the link MTU on the client side to 1501, one can see the discarded Ethernet frames with tcpdump running on the client. A ping -s 2948 -c 1 192.168.15.254 leads to the smaller fragment that is correctly received on the server side: (tcpdump -vvvne -i eth3 pppoes and ppp proto 0x3d) 52:54:00:ad:87:fd > 52:54:00:79:5c:d0, ethertype PPPoE S (0x8864), length 1514: PPPoE [ses 0x3] MLPPP (0x003d), length 1494: seq 0x000, Flags [end], length 1492 and to the bigger fragment that is not received on the server side: (tcpdump -vvvne -i eth2 pppoes and ppp proto 0x3d) 52:54:00:70:9e:89 > 52:54:00:5d:6f:b0, ethertype PPPoE S (0x8864), length 1515: PPPoE [ses 0x5] MLPPP (0x003d), length 1495: seq 0x000, Flags [begin], length 1493 With the patch below, we correctly obtain three fragments: 52:54:00:ad:87:fd > 52:54:00:79:5c:d0, ethertype PPPoE S (0x8864), length 1514: PPPoE [ses 0x1] MLPPP (0x003d), length 1494: seq 0x000, Flags [begin], length 1492 52:54:00:70:9e:89 > 52:54:00:5d:6f:b0, ethertype PPPoE S (0x8864), length 1514: PPPoE [ses 0x1] MLPPP (0x003d), length 1494: seq 0x000, Flags [none], length 1492 52:54:00:ad:87:fd > 52:54:00:79:5c:d0, ethertype PPPoE S (0x8864), length 27: PPPoE [ses 0x1] MLPPP (0x003d), length 7: seq 0x000, Flags [end], length 5 And the ICMPv4 echo request is successfully received at the server side: IP (tos 0x0, ttl 64, id 21925, offset 0, flags [DF], proto ICMP (1), length 2976) 192.168.222.2 > 192.168.15.254: ICMP echo request, id 30530, seq 0, length 2956 The bug was introduced in commit c9aa6895371b2a257401f59d3393c9f7ac5a8698 ("[PPPOE]: Advertise PPPoE MTU") from the very beginning. This patch applies to 3.10 upwards but the fix can be applied (with minor modifications) to kernels as old as 2.6.32. Signed-off-by: Christoph Schulz Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 87c6cfe711b95a31f9e840e6911b4905e317cb09 Author: Daniel Borkmann Date: Sat Jul 12 20:30:35 2014 +0200 net: sctp: fix information leaks in ulpevent layer [ Upstream commit 8f2e5ae40ec193bc0a0ed99e95315c3eebca84ea ] While working on some other SCTP code, I noticed that some structures shared with user space are leaking uninitialized stack or heap buffer. In particular, struct sctp_sndrcvinfo has a 2 bytes hole between .sinfo_flags and .sinfo_ppid that remains unfilled by us in sctp_ulpevent_read_sndrcvinfo() when putting this into cmsg. But also struct sctp_remote_error contains a 2 bytes hole that we don't fill but place into a skb through skb_copy_expand() via sctp_ulpevent_make_remote_error(). Both structures are defined by the IETF in RFC6458: * Section 5.3.2. SCTP Header Information Structure: The sctp_sndrcvinfo structure is defined below: struct sctp_sndrcvinfo { uint16_t sinfo_stream; uint16_t sinfo_ssn; uint16_t sinfo_flags; <-- 2 bytes hole --> uint32_t sinfo_ppid; uint32_t sinfo_context; uint32_t sinfo_timetolive; uint32_t sinfo_tsn; uint32_t sinfo_cumtsn; sctp_assoc_t sinfo_assoc_id; }; * 6.1.3. SCTP_REMOTE_ERROR: A remote peer may send an Operation Error message to its peer. This message indicates a variety of error conditions on an association. The entire ERROR chunk as it appears on the wire is included in an SCTP_REMOTE_ERROR event. Please refer to the SCTP specification [RFC4960] and any extensions for a list of possible error formats. An SCTP error notification has the following format: struct sctp_remote_error { uint16_t sre_type; uint16_t sre_flags; uint32_t sre_length; uint16_t sre_error; <-- 2 bytes hole --> sctp_assoc_t sre_assoc_id; uint8_t sre_data[]; }; Fix this by setting both to 0 before filling them out. We also have other structures shared between user and kernel space in SCTP that contains holes (e.g. struct sctp_paddrthlds), but we copy that buffer over from user space first and thus don't need to care about it in that cases. While at it, we can also remove lengthy comments copied from the draft, instead, we update the comment with the correct RFC number where one can look it up. Signed-off-by: Daniel Borkmann Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit f0f998cc4bfa3caf5225afe6813306f53f30e0de Author: Jon Paul Maloy Date: Fri Jul 11 08:45:27 2014 -0400 tipc: clear 'next'-pointer of message fragments before reassembly [ Upstream commit 999417549c16dd0e3a382aa9f6ae61688db03181 ] If the 'next' pointer of the last fragment buffer in a message is not zeroed before reassembly, we risk ending up with a corrupt message, since the reassembly function itself isn't doing this. Currently, when a buffer is retrieved from the deferred queue of the broadcast link, the next pointer is not cleared, with the result as described above. This commit corrects this, and thereby fixes a bug that may occur when long broadcast messages are transmitted across dual interfaces. The bug has been present since 40ba3cdf542a469aaa9083fa041656e59b109b90 ("tipc: message reassembly using fragment chain") This commit should be applied to both net and net-next. Signed-off-by: Jon Maloy Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 1787bcf56b11846e2bc53e8e544a649d0f739798 Author: Suresh Reddy Date: Fri Jul 11 14:03:01 2014 +0530 be2net: set EQ DB clear-intr bit in be_open() [ Upstream commit 4cad9f3b61c7268fa89ab8096e23202300399b5d ] On BE3, if the clear-interrupt bit of the EQ doorbell is not set the first time it is armed, ocassionally we have observed that the EQ doesn't raise anymore interrupts even if it is in armed state. This patch fixes this by setting the clear-interrupt bit when EQs are armed for the first time in be_open(). Signed-off-by: Suresh Reddy Signed-off-by: Sathya Perla Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 664a42a1186047c6e6277570dd4342c88914b96c Author: Ben Pfaff Date: Wed Jul 9 10:31:22 2014 -0700 netlink: Fix handling of error from netlink_dump(). [ Upstream commit ac30ef832e6af0505b6f0251a6659adcfa74975e ] netlink_dump() returns a negative errno value on error. Until now, netlink_recvmsg() directly recorded that negative value in sk->sk_err, but that's wrong since sk_err takes positive errno values. (This manifests as userspace receiving a positive return value from the recv() system call, falsely indicating success.) This bug was introduced in the commit that started checking the netlink_dump() return value, commit b44d211 (netlink: handle errors from netlink_dump()). Multithreaded Netlink dumps are one way to trigger this behavior in practice, as described in the commit message for the userspace workaround posted here: http://openvswitch.org/pipermail/dev/2014-June/042339.html This commit also fixes the same bug in netlink_poll(), introduced in commit cd1df525d (netlink: add flow control for memory mapped I/O). Signed-off-by: Ben Pfaff Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit b4ca5bfee380af0bd89683bd73485357dd22dab3 Author: Thomas Fitzsimmons Date: Tue Jul 8 19:44:07 2014 -0400 net: mvneta: Fix big endian issue in mvneta_txq_desc_csum() [ Upstream commit 0a1985879437d14bda8c90d0dae3455c467d7642 ] This commit fixes the command value generated for CSUM calculation when running in big endian mode. The Ethernet protocol ID for IP was being unconditionally byte-swapped in the layer 3 protocol check (with swab16), which caused the mvneta driver to not function correctly in big endian mode. This patch byte-swaps the ID conditionally with htons. Cc: # v3.13+ Signed-off-by: Thomas Fitzsimmons Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit bb2e73b108e6140acc8572a28683cf6f3e20cf96 Author: Thomas Petazzoni Date: Tue Jul 8 10:49:43 2014 +0200 net: mvneta: fix operation in 10 Mbit/s mode [ Upstream commit 4d12bc63ab5e48c1d78fa13883cf6fefcea3afb1 ] As reported by Maggie Mae Roxas, the mvneta driver doesn't behave properly in 10 Mbit/s mode. This is due to a misconfiguration of the MVNETA_GMAC_AUTONEG_CONFIG register: bit MVNETA_GMAC_CONFIG_MII_SPEED must be set for a 100 Mbit/s speed, but cleared for a 10 Mbit/s speed, which the driver was not properly doing. This commit adjusts that by setting the MVNETA_GMAC_CONFIG_MII_SPEED bit only in 100 Mbit/s mode, and relying on the fact that all the speed related bits of this register are cleared at the beginning of the mvneta_adjust_link() function. This problem exists since c5aff18204da0 ("net: mvneta: driver for Marvell Armada 370/XP network unit") which is the commit that introduced the mvneta driver in the kernel. Cc: # v3.8+ Fixes: c5aff18204da0 ("net: mvneta: driver for Marvell Armada 370/XP network unit") Reported-by: Maggie Mae Roxas Cc: Maggie Mae Roxas Signed-off-by: Thomas Petazzoni Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 62d7774d1b3288346c951952141bca7245ca0cf6 Author: Andrey Utkin Date: Mon Jul 7 23:22:50 2014 +0300 appletalk: Fix socket referencing in skb [ Upstream commit 36beddc272c111689f3042bf3d10a64d8a805f93 ] Setting just skb->sk without taking its reference and setting a destructor is invalid. However, in the places where this was done, skb is used in a way not requiring skb->sk setting. So dropping the setting of skb->sk. Thanks to Eric Dumazet for correct solution. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=79441 Reported-by: Ed Martin Signed-off-by: Andrey Utkin Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit a429801f9e5eb209f6d4706c559f3510b94404e9 Author: Yuchung Cheng Date: Wed Jul 2 12:07:16 2014 -0700 tcp: fix false undo corner cases [ Upstream commit 6e08d5e3c8236e7484229e46fdf92006e1dd4c49 ] The undo code assumes that, upon entering loss recovery, TCP 1) always retransmit something 2) the retransmission never fails locally (e.g., qdisc drop) so undo_marker is set in tcp_enter_recovery() and undo_retrans is incremented only when tcp_retransmit_skb() is successful. When the assumption is broken because TCP's cwnd is too small to retransmit or the retransmit fails locally. The next (DUP)ACK would incorrectly revert the cwnd and the congestion state in tcp_try_undo_dsack() or tcp_may_undo(). Subsequent (DUP)ACKs may enter the recovery state. The sender repeatedly enter and (incorrectly) exit recovery states if the retransmits continue to fail locally while receiving (DUP)ACKs. The fix is to initialize undo_retrans to -1 and start counting on the first retransmission. Always increment undo_retrans even if the retransmissions fail locally because they couldn't cause DSACKs to undo the cwnd reduction. Signed-off-by: Yuchung Cheng Signed-off-by: Neal Cardwell Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 8b5933c07b843d33fb240f6189890fa00d199a5a Author: dingtianhong Date: Wed Jul 2 13:50:48 2014 +0800 igmp: fix the problem when mc leave group [ Upstream commit 52ad353a5344f1f700c5b777175bdfa41d3cd65a ] The problem was triggered by these steps: 1) create socket, bind and then setsockopt for add mc group. mreq.imr_multiaddr.s_addr = inet_addr("255.0.0.37"); mreq.imr_interface.s_addr = inet_addr("192.168.1.2"); setsockopt(sockfd, IPPROTO_IP, IP_ADD_MEMBERSHIP, &mreq, sizeof(mreq)); 2) drop the mc group for this socket. mreq.imr_multiaddr.s_addr = inet_addr("255.0.0.37"); mreq.imr_interface.s_addr = inet_addr("0.0.0.0"); setsockopt(sockfd, IPPROTO_IP, IP_DROP_MEMBERSHIP, &mreq, sizeof(mreq)); 3) and then drop the socket, I found the mc group was still used by the dev: netstat -g Interface RefCnt Group --------------- ------ --------------------- eth2 1 255.0.0.37 Normally even though the IP_DROP_MEMBERSHIP return error, the mc group still need to be released for the netdev when drop the socket, but this process was broken when route default is NULL, the reason is that: The ip_mc_leave_group() will choose the in_dev by the imr_interface.s_addr, if input addr is NULL, the default route dev will be chosen, then the ifindex is got from the dev, then polling the inet->mc_list and return -ENODEV, but if the default route dev is NULL, the in_dev and ifIndex is both NULL, when polling the inet->mc_list, the mc group will be released from the mc_list, but the dev didn't dec the refcnt for this mc group, so when dropping the socket, the mc_list is NULL and the dev still keep this group. v1->v2: According Hideaki's suggestion, we should align with IPv6 (RFC3493) and BSDs, so I add the checking for the in_dev before polling the mc_list, make sure when we remove the mc group, dec the refcnt to the real dev which was using the mc address. The problem would never happened again. Signed-off-by: Ding Tianhong Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 3dc0c70ff2c78f2ba57692d4c88d39522c972ba6 Author: Loic Prylli Date: Tue Jul 1 21:39:43 2014 -0700 net: Fix NETDEV_CHANGE notifier usage causing spurious arp flush [ Upstream commit 54951194656e4853e441266fd095f880bc0398f3 ] A bug was introduced in NETDEV_CHANGE notifier sequence causing the arp table to be sometimes spuriously cleared (including manual arp entries marked permanent), upon network link carrier changes. The changed argument for the notifier was applied only to a single caller of NETDEV_CHANGE, missing among others netdev_state_change(). So upon net_carrier events induced by the network, which are triggering a call to netdev_state_change(), arp_netdev_event() would decide whether to clear or not arp cache based on random/junk stack values (a kind of read buffer overflow). Fixes: be9efd365328 ("net: pass changed flags along with NETDEV_CHANGE event") Fixes: 6c8b4e3ff81b ("arp: flush arp cache on IFF_NOARP change") Signed-off-by: Loic Prylli Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit ef6357dd643d3cfe192973119eb6aadd6b9f5c66 Author: Bjørn Mork Date: Thu Jul 17 13:33:51 2014 +0200 net: qmi_wwan: add two Sierra Wireless/Netgear devices [ Upstream commit 5343330010a892b76a97fd93ad3c455a4a32a7fb ] Add two device IDs found in an out-of-tree driver downloadable from Netgear. Signed-off-by: Bjørn Mork Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 673ef269d23b03df8ba99a71cb6e66aaf6789c1a Author: Bernd Wachter Date: Tue Jul 1 22:01:09 2014 +0300 net: qmi_wwan: Add ID for Telewell TW-LTE 4G v2 [ Upstream commit 8dcb4b1526747d8431f9895e153dd478c9d16186 ] There's a new version of the Telewell 4G modem working with, but not recognized by this driver. Signed-off-by: Bernd Wachter Acked-by: Bjørn Mork Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 926565a7ffbe945c69cee9b5268e43da72679153 Author: Edward Allcutt Date: Mon Jun 30 16:16:02 2014 +0100 ipv4: icmp: Fix pMTU handling for rare case [ Upstream commit 68b7107b62983f2cff0948292429d5f5999df096 ] Some older router implementations still send Fragmentation Needed errors with the Next-Hop MTU field set to zero. This is explicitly described as an eventuality that hosts must deal with by the standard (RFC 1191) since older standards specified that those bits must be zero. Linux had a generic (for all of IPv4) implementation of the algorithm described in the RFC for searching a list of MTU plateaus for a good value. Commit 46517008e116 ("ipv4: Kill ip_rt_frag_needed().") removed this as part of the changes to remove the routing cache. Subsequently any Fragmentation Needed packet with a zero Next-Hop MTU has been discarded without being passed to the per-protocol handlers or notifying userspace for raw sockets. When there is a router which does not implement RFC 1191 on an MTU limited path then this results in stalled connections since large packets are discarded and the local protocols are not notified so they never attempt to lower the pMTU. One example I have seen is an OpenBSD router terminating IPSec tunnels. It's worth pointing out that this case is distinct from the BSD 4.2 bug which incorrectly calculated the Next-Hop MTU since the commit in question dismissed that as a valid concern. All of the per-protocols handlers implement the simple approach from RFC 1191 of immediately falling back to the minimum value. Although this is sub-optimal it is vastly preferable to connections hanging indefinitely. Remove the Next-Hop MTU != 0 check and allow such packets to follow the normal path. Fixes: 46517008e116 ("ipv4: Kill ip_rt_frag_needed().") Signed-off-by: Edward Allcutt Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit adc78978cc0b857cf32abaf8b28b3328a171297d Author: Christoph Paasch Date: Sat Jun 28 18:26:37 2014 +0200 tcp: Fix divide by zero when pushing during tcp-repair [ Upstream commit 5924f17a8a30c2ae18d034a86ee7581b34accef6 ] When in repair-mode and TCP_RECV_QUEUE is set, we end up calling tcp_push with mss_now being 0. If data is in the send-queue and tcp_set_skb_tso_segs gets called, we crash because it will divide by mss_now: [ 347.151939] divide error: 0000 [#1] SMP [ 347.152907] Modules linked in: [ 347.152907] CPU: 1 PID: 1123 Comm: packetdrill Not tainted 3.16.0-rc2 #4 [ 347.152907] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 [ 347.152907] task: f5b88540 ti: f3c82000 task.ti: f3c82000 [ 347.152907] EIP: 0060:[] EFLAGS: 00210246 CPU: 1 [ 347.152907] EIP is at tcp_set_skb_tso_segs+0x49/0xa0 [ 347.152907] EAX: 00000b67 EBX: f5acd080 ECX: 00000000 EDX: 00000000 [ 347.152907] ESI: f5a28f40 EDI: f3c88f00 EBP: f3c83d10 ESP: f3c83d00 [ 347.152907] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 [ 347.152907] CR0: 80050033 CR2: 083158b0 CR3: 35146000 CR4: 000006b0 [ 347.152907] Stack: [ 347.152907] c167f9d9 f5acd080 000005b4 00000002 f3c83d20 c16013e6 f3c88f00 f5acd080 [ 347.152907] f3c83da0 c1603b5a f3c83d38 c10a0188 00000000 00000000 f3c83d84 c10acc85 [ 347.152907] c1ad5ec0 00000000 00000000 c1ad679c 010003e0 00000000 00000000 f3c88fc8 [ 347.152907] Call Trace: [ 347.152907] [] ? apic_timer_interrupt+0x2d/0x34 [ 347.152907] [] tcp_init_tso_segs+0x36/0x50 [ 347.152907] [] tcp_write_xmit+0x7a/0xbf0 [ 347.152907] [] ? up+0x28/0x40 [ 347.152907] [] ? console_unlock+0x295/0x480 [ 347.152907] [] ? vprintk_emit+0x1ef/0x4b0 [ 347.152907] [] __tcp_push_pending_frames+0x36/0xd0 [ 347.152907] [] tcp_push+0xf0/0x120 [ 347.152907] [] tcp_sendmsg+0xf1/0xbf0 [ 347.152907] [] ? kmem_cache_free+0xf0/0x120 [ 347.152907] [] ? __sigqueue_free+0x32/0x40 [ 347.152907] [] ? __sigqueue_free+0x32/0x40 [ 347.152907] [] ? do_wp_page+0x3e0/0x850 [ 347.152907] [] inet_sendmsg+0x4a/0xb0 [ 347.152907] [] ? handle_mm_fault+0x709/0xfb0 [ 347.152907] [] sock_aio_write+0xbb/0xd0 [ 347.152907] [] do_sync_write+0x69/0xa0 [ 347.152907] [] vfs_write+0x123/0x160 [ 347.152907] [] SyS_write+0x55/0xb0 [ 347.152907] [] sysenter_do_call+0x12/0x28 This can easily be reproduced with the following packetdrill-script (the "magic" with netem, sk_pacing and limit_output_bytes is done to prevent the kernel from pushing all segments, because hitting the limit without doing this is not so easy with packetdrill): 0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 +0 < S 0:0(0) win 32792 +0 > S. 0:0(0) ack 1 +0.1 < . 1:1(0) ack 1 win 65000 +0 accept(3, ..., ...) = 4 // This forces that not all segments of the snd-queue will be pushed +0 `tc qdisc add dev tun0 root netem delay 10ms` +0 `sysctl -w net.ipv4.tcp_limit_output_bytes=2` +0 setsockopt(4, SOL_SOCKET, 47, [2], 4) = 0 +0 write(4,...,10000) = 10000 +0 write(4,...,10000) = 10000 // Set tcp-repair stuff, particularly TCP_RECV_QUEUE +0 setsockopt(4, SOL_TCP, 19, [1], 4) = 0 +0 setsockopt(4, SOL_TCP, 20, [1], 4) = 0 // This now will make the write push the remaining segments +0 setsockopt(4, SOL_SOCKET, 47, [20000], 4) = 0 +0 `sysctl -w net.ipv4.tcp_limit_output_bytes=130000` // Now we will crash +0 write(4,...,1000) = 1000 This happens since ec3423257508 (tcp: fix retransmission in repair mode). Prior to that, the call to tcp_push was prevented by a check for tp->repair. The patch fixes it, by adding the new goto-label out_nopush. When exiting tcp_sendmsg and a push is not required, which is the case for tp->repair, we go to this label. When repairing and calling send() with TCP_RECV_QUEUE, the data is actually put in the receive-queue. So, no push is required because no data has been added to the send-queue. Cc: Andrew Vagin Cc: Pavel Emelyanov Fixes: ec3423257508 (tcp: fix retransmission in repair mode) Signed-off-by: Christoph Paasch Acked-by: Andrew Vagin Acked-by: Pavel Emelyanov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 406f49c9d0f3469aae6332a2e3ad665a25bb1ae7 Author: Eric Dumazet Date: Thu Jun 26 00:44:02 2014 -0700 bnx2x: fix possible panic under memory stress [ Upstream commit 07b0f00964def8af9321cfd6c4a7e84f6362f728 ] While it is legal to kfree(NULL), it is not wise to use : put_page(virt_to_head_page(NULL)) BUG: unable to handle kernel paging request at ffffeba400000000 IP: [] virt_to_head_page+0x36/0x44 [bnx2x] Reported-by: Michel Lespinasse Signed-off-by: Eric Dumazet Cc: Ariel Elior Fixes: d46d132cc021 ("bnx2x: use netdev_alloc_frag()") Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 45917cc8c0dd959e57cc5b754295fa527bcb7baa Author: Eric Dumazet Date: Wed Jul 2 02:25:15 2014 -0700 vlan: free percpu stats in device destructor [ Upstream commit a48e5fafecfb9c0c807d7e7284b5ff884dfb7a3a ] Madalin-Cristian reported crashs happening after a recent commit (5a4ae5f6e7d4 "vlan: unnecessary to check if vlan_pcpu_stats is NULL") ----------------------------------------------------------------------- root@p5040ds:~# vconfig add eth8 1 root@p5040ds:~# vconfig rem eth8.1 Unable to handle kernel paging request for data at address 0x2bc88028 Faulting instruction address: 0xc058e950 Oops: Kernel access of bad area, sig: 11 [#1] SMP NR_CPUS=8 CoreNet Generic Modules linked in: CPU: 3 PID: 2167 Comm: vconfig Tainted: G W 3.16.0-rc3-00346-g65e85bf #2 task: e7264d90 ti: e2c2c000 task.ti: e2c2c000 NIP: c058e950 LR: c058ea30 CTR: c058e900 REGS: e2c2db20 TRAP: 0300 Tainted: G W (3.16.0-rc3-00346-g65e85bf) MSR: 00029002 CR: 48000428 XER: 20000000 DEAR: 2bc88028 ESR: 00000000 GPR00: c047299c e2c2dbd0 e7264d90 00000000 2bc88000 00000000 ffffffff 00000000 GPR08: 0000000f 00000000 000000ff 00000000 28000422 10121928 10100000 10100000 GPR16: 10100000 00000000 c07c5968 00000000 00000000 00000000 e2c2dc48 e7838000 GPR24: c07c5bac c07c58a8 e77290cc c07b0000 00000000 c05de6c0 e7838000 e2c2dc48 NIP [c058e950] vlan_dev_get_stats64+0x50/0x170 LR [c058ea30] vlan_dev_get_stats64+0x130/0x170 Call Trace: [e2c2dbd0] [ffffffea] 0xffffffea (unreliable) [e2c2dc20] [c047299c] dev_get_stats+0x4c/0x140 [e2c2dc40] [c0488ca8] rtnl_fill_ifinfo+0x3d8/0x960 [e2c2dd70] [c0489f4c] rtmsg_ifinfo+0x6c/0x110 [e2c2dd90] [c04731d4] rollback_registered_many+0x344/0x3b0 [e2c2ddd0] [c047332c] rollback_registered+0x2c/0x50 [e2c2ddf0] [c0476058] unregister_netdevice_queue+0x78/0xf0 [e2c2de00] [c058d800] unregister_vlan_dev+0xc0/0x160 [e2c2de20] [c058e360] vlan_ioctl_handler+0x1c0/0x550 [e2c2de90] [c045d11c] sock_ioctl+0x28c/0x2f0 [e2c2deb0] [c010d070] do_vfs_ioctl+0x90/0x7b0 [e2c2df20] [c010d7d0] SyS_ioctl+0x40/0x80 [e2c2df40] [c000f924] ret_from_syscall+0x0/0x3c Fix this problem by freeing percpu stats from dev->destructor() instead of ndo_uninit() Reported-by: Madalin-Cristian Bucur Signed-off-by: Eric Dumazet Tested-by: Madalin-Cristian Bucur Fixes: 5a4ae5f6e7d4 ("vlan: unnecessary to check if vlan_pcpu_stats is NULL") Cc: Li RongQing Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit fde40bdfda59c317d9b130b21a569f3285e269e4 Author: Eric Dumazet Date: Wed Jul 2 02:39:38 2014 -0700 net: fix sparse warning in sk_dst_set() [ Upstream commit 5925a0555bdaf0b396a84318cbc21ba085f6c0d3 ] sk_dst_cache has __rcu annotation, so we need a cast to avoid following sparse error : include/net/sock.h:1774:19: warning: incorrect type in initializer (different address spaces) include/net/sock.h:1774:19: expected struct dst_entry [noderef] *__ret include/net/sock.h:1774:19: got struct dst_entry *dst Signed-off-by: Eric Dumazet Reported-by: kbuild test robot Fixes: 7f502361531e ("ipv4: irq safe sk_dst_[re]set() and ipv4_sk_update_pmtu() fix") Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit cd6893fae616cb894b661a3919ecb74c8ea4c34d Author: Eric Dumazet Date: Mon Jun 30 01:26:23 2014 -0700 ipv4: irq safe sk_dst_[re]set() and ipv4_sk_update_pmtu() fix [ Upstream commit 7f502361531e9eecb396cf99bdc9e9a59f7ebd7f ] We have two different ways to handle changes to sk->sk_dst First way (used by TCP) assumes socket lock is owned by caller, and use no extra lock : __sk_dst_set() & __sk_dst_reset() Another way (used by UDP) uses sk_dst_lock because socket lock is not always taken. Note that sk_dst_lock is not softirq safe. These ways are not inter changeable for a given socket type. ipv4_sk_update_pmtu(), added in linux-3.8, added a race, as it used the socket lock as synchronization, but users might be UDP sockets. Instead of converting sk_dst_lock to a softirq safe version, use xchg() as we did for sk_rx_dst in commit e47eb5dfb296b ("udp: ipv4: do not use sk_dst_lock from softirq context") In a follow up patch, we probably can remove sk_dst_lock, as it is only used in IPv6. Signed-off-by: Eric Dumazet Cc: Steffen Klassert Fixes: 9cb3a50c5f63e ("ipv4: Invalidate the socket cached route on pmtu events if possible") Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit df90819dafcd6b97fc665f63a15752a570e227a2 Author: Eric Dumazet Date: Tue Jun 24 10:05:11 2014 -0700 ipv4: fix dst race in sk_dst_get() [ Upstream commit f88649721268999bdff09777847080a52004f691 ] When IP route cache had been removed in linux-3.6, we broke assumption that dst entries were all freed after rcu grace period. DST_NOCACHE dst were supposed to be freed from dst_release(). But it appears we want to keep such dst around, either in UDP sockets or tunnels. In sk_dst_get() we need to make sure dst refcount is not 0 before incrementing it, or else we might end up freeing a dst twice. DST_NOCACHE set on a dst does not mean this dst can not be attached to a socket or a tunnel. Then, before actual freeing, we need to observe a rcu grace period to make sure all other cpus can catch the fact the dst is no longer usable. Signed-off-by: Eric Dumazet Reported-by: Dormando Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 10bc26ecaf0e7e80c14b98375bbfe85f977a53f2 Author: Wei-Chun Chao Date: Sun Jun 8 23:48:54 2014 -0700 net: fix UDP tunnel GSO of frag_list GRO packets [ Upstream commit 5882a07c72093dc3a18e2d2b129fb200686bb6ee ] This patch fixes a kernel BUG_ON in skb_segment. It is hit when testing two VMs on openvswitch with one VM acting as VXLAN gateway. During VXLAN packet GSO, skb_segment is called with skb->data pointing to inner TCP payload. skb_segment calls skb_network_protocol to retrieve the inner protocol. skb_network_protocol actually expects skb->data to point to MAC and it calls pskb_may_pull with ETH_HLEN. This ends up pulling in ETH_HLEN data from header tail. As a result, pskb_trim logic is skipped and BUG_ON is hit later. Move skb_push in front of skb_network_protocol so that skb->data lines up properly. kernel BUG at net/core/skbuff.c:2999! Call Trace: [] tcp_gso_segment+0x122/0x410 [] inet_gso_segment+0x13c/0x390 [] skb_mac_gso_segment+0x9b/0x170 [] skb_udp_tunnel_segment+0xd8/0x390 [] udp4_ufo_fragment+0x120/0x140 [] inet_gso_segment+0x13c/0x390 [] ? default_wake_function+0x12/0x20 [] skb_mac_gso_segment+0x9b/0x170 [] __skb_gso_segment+0x60/0xc0 [] dev_hard_start_xmit+0x183/0x550 [] sch_direct_xmit+0xfe/0x1d0 [] __dev_queue_xmit+0x214/0x4f0 [] dev_queue_xmit+0x10/0x20 [] ip_finish_output+0x66b/0x890 [] ip_output+0x58/0x90 [] ? fib_table_lookup+0x29f/0x350 [] ip_local_out_sk+0x39/0x50 [] iptunnel_xmit+0x10d/0x130 [] vxlan_xmit_skb+0x1d0/0x330 [vxlan] [] vxlan_tnl_send+0x129/0x1a0 [openvswitch] [] ovs_vport_send+0x26/0xa0 [openvswitch] [] do_output+0x2e/0x50 [openvswitch] Signed-off-by: Wei-Chun Chao Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 71dd984cd40c7baf1c61a2b515f7f1a811ff87f3 Author: Bjørn Mork Date: Wed Jun 18 14:21:24 2014 +0200 net: huawei_cdc_ncm: increase command buffer size [ Upstream commit 3acc74619b0175b7a154cf8dc54813f6faf97aa9 ] Messages from the modem exceeding 256 bytes cause communication failure. The WDM protocol is strictly "read on demand", meaning that we only poll for unread data after receiving a notification from the modem. Since we have no way to know how much data the modem has to send, we must make sure that the buffer we provide is "big enough". Message truncation does not work. Truncated messages are left unread until the modem has another message to send. Which often won't happen until the userspace application has given up waiting for the final part of the last message, and therefore sends another command. With a proper CDC WDM function there is a descriptor telling us which buffer size the modem uses. But with this vendor specific implementation there is no known way to calculate the exact "big enough" number. It is an unknown property of the modem firmware. Experience has shown that 256 is too small. The discussion of this failure ended up concluding that 512 might be too small as well. So 1024 seems like a reasonable value for now. Fixes: 41c47d8cfd68 ("net: huawei_cdc_ncm: Introduce the huawei_cdc_ncm driver") Cc: Enrico Mioso Reported-by: Dan Williams Signed-off-by: Bjørn Mork Acked-By: Enrico Mioso Tested-by: Dan Williams Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 73035bc31e9d63e6e4004bb31d4534e9c755ac30 Author: Li RongQing Date: Wed Jun 18 13:46:02 2014 +0800 8021q: fix a potential memory leak [ Upstream commit 916c1689a09bc1ca81f2d7a34876f8d35aadd11b ] skb_cow called in vlan_reorder_header does not free the skb when it failed, and vlan_reorder_header returns NULL to reset original skb when it is called in vlan_untag, lead to a memory leak. Signed-off-by: Li RongQing Acked-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit d6d1d62a3900eb3f0d9660326fa985229be7830c Author: Daniel Borkmann Date: Wed Jun 18 23:46:31 2014 +0200 net: sctp: check proc_dointvec result in proc_sctp_do_auth [ Upstream commit 24599e61b7552673dd85971cf5a35369cd8c119e ] When writing to the sysctl field net.sctp.auth_enable, it can well be that the user buffer we handed over to proc_dointvec() via proc_sctp_do_auth() handler contains something other than integers. In that case, we would set an uninitialized 4-byte value from the stack to net->sctp.auth_enable that can be leaked back when reading the sysctl variable, and it can unintentionally turn auth_enable on/off based on the stack content since auth_enable is interpreted as a boolean. Fix it up by making sure proc_dointvec() returned sucessfully. Fixes: b14878ccb7fa ("net: sctp: cache auth_enable per endpoint") Reported-by: Florian Westphal Signed-off-by: Daniel Borkmann Acked-by: Neil Horman Acked-by: Vlad Yasevich Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 62bce51b8a81a3a325c194afd27d69507b1e9b8e Author: Neal Cardwell Date: Wed Jun 18 21:15:03 2014 -0400 tcp: fix tcp_match_skb_to_sack() for unaligned SACK at end of an skb [ Upstream commit 2cd0d743b05e87445c54ca124a9916f22f16742e ] If there is an MSS change (or misbehaving receiver) that causes a SACK to arrive that covers the end of an skb but is less than one MSS, then tcp_match_skb_to_sack() was rounding up pkt_len to the full length of the skb ("Round if necessary..."), then chopping all bytes off the skb and creating a zero-byte skb in the write queue. This was visible now because the recently simplified TLP logic in bef1909ee3ed1c ("tcp: fixing TLP's FIN recovery") could find that 0-byte skb at the end of the write queue, and now that we do not check that skb's length we could send it as a TLP probe. Consider the following example scenario: mss: 1000 skb: seq: 0 end_seq: 4000 len: 4000 SACK: start_seq: 3999 end_seq: 4000 The tcp_match_skb_to_sack() code will compute: in_sack = false pkt_len = start_seq - TCP_SKB_CB(skb)->seq = 3999 - 0 = 3999 new_len = (pkt_len / mss) * mss = (3999/1000)*1000 = 3000 new_len += mss = 4000 Previously we would find the new_len > skb->len check failing, so we would fall through and set pkt_len = new_len = 4000 and chop off pkt_len of 4000 from the 4000-byte skb, leaving a 0-byte segment afterward in the write queue. With this new commit, we notice that the new new_len >= skb->len check succeeds, so that we return without trying to fragment. Fixes: adb92db857ee ("tcp: Make SACK code to split only at mss boundaries") Reported-by: Eric Dumazet Signed-off-by: Neal Cardwell Cc: Eric Dumazet Cc: Yuchung Cheng Cc: Ilpo Jarvinen Acked-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 38d1c951cb8810953cadeb6a8e66fabde3de1c89 Author: Daniel Borkmann Date: Thu Jun 19 01:31:30 2014 +0200 net: sctp: propagate sysctl errors from proc_do* properly [ Upstream commit ff5e92c1affe7166b3f6e7073e648ed65a6e2e59 ] sysctl handler proc_sctp_do_hmac_alg(), proc_sctp_do_rto_min() and proc_sctp_do_rto_max() do not properly reflect some error cases when writing values via sysctl from internal proc functions such as proc_dointvec() and proc_dostring(). In all these cases we pass the test for write != 0 and partially do additional work just to notice that additional sanity checks fail and we return with hard-coded -EINVAL while proc_do* functions might also return different errors. So fix this up by simply testing a successful return of proc_do* right after calling it. This also allows to propagate its return value onwards to the user. While touching this, also fix up some minor style issues. Fixes: 4f3fdf3bc59c ("sctp: add check rto_min and rto_max in sysctl") Fixes: 3c68198e7511 ("sctp: Make hmac algorithm selection for cookie generation dynamic") Signed-off-by: Daniel Borkmann Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 49f5947b8a69e5fce9ef98429113fffde4a62b5a Author: Tyler Hall Date: Sun Jun 15 22:23:17 2014 -0400 slcan: Port write_wakeup deadlock fix from slip [ Upstream commit a8e83b17536aad603fbeae4c460f2da0ee9fe6ed ] The commit "slip: Fix deadlock in write_wakeup" fixes a deadlock caused by a change made in both slcan and slip. This is a direct port of that fix. Signed-off-by: Tyler Hall Cc: Oliver Hartkopp Cc: Andre Naujoks Cc: David S. Miller Cc: linux-kernel@vger.kernel.org Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 9f06f0f45843164eb9ff100c9f4ba430ff2326d5 Author: Tyler Hall Date: Sun Jun 15 22:23:16 2014 -0400 slip: Fix deadlock in write_wakeup [ Upstream commit 661f7fda21b15ec52f57fcd397c03370acc28688 ] Use schedule_work() to avoid potentially taking the spinlock in interrupt context. Commit cc9fa74e2a ("slip/slcan: added locking in wakeup function") added necessary locking to the wakeup function and 367525c8c2/ddcde142be ("can: slcan: Fix spinlock variant") converted it to spin_lock_bh() because the lock is also taken in timers. Disabling softirqs is not sufficient, however, as tty drivers may call write_wakeup from interrupt context. This driver calls tty->ops->write() with its spinlock held, which may immediately cause an interrupt on the same CPU and subsequent spin_bug(). Simply converting to spin_lock_irq/irqsave() prevents this deadlock, but causes lockdep to point out a possible circular locking dependency between these locks: (&(&sl->lock)->rlock){-.....}, at: slip_write_wakeup (&port_lock_key){-.....}, at: serial8250_handle_irq.part.13 The slip transmit is holding the slip spinlock when calling the tty write. This grabs the port lock. On an interrupt, the handler grabs the port lock and calls write_wakeup which grabs the slip lock. This could be a problem if a serial interrupt occurs on another CPU during the slip transmit. To deal with these issues, don't grab the lock in the wakeup function by deferring the writeout to a workqueue. Also hold the lock during close when de-assigning the tty pointer to safely disarm the worker and timers. This bug is easily reproducible on the first transmit when slip is used with the standard 8250 serial driver. [] (spin_bug+0x0/0x38) from [] (do_raw_spin_lock+0x60/0x1d0) r5:eab27000 r4:ec02754c [] (do_raw_spin_lock+0x0/0x1d0) from [] (_raw_spin_lock+0x28/0x2c) r10:0000001f r9:eabb814c r8:eabb8140 r7:40070193 r6:ec02754c r5:eab27000 r4:ec02754c r3:00000000 [] (_raw_spin_lock+0x0/0x2c) from [] (slip_write_wakeup+0x50/0xe0 [slip]) r4:ec027540 r3:00000003 [] (slip_write_wakeup+0x0/0xe0 [slip]) from [] (tty_wakeup+0x48/0x68) r6:00000000 r5:ea80c480 r4:eab27000 r3:bf3a01d0 [] (tty_wakeup+0x0/0x68) from [] (uart_write_wakeup+0x2c/0x30) r5:ed68ea90 r4:c06790d8 [] (uart_write_wakeup+0x0/0x30) from [] (serial8250_tx_chars+0x114/0x170) [] (serial8250_tx_chars+0x0/0x170) from [] (serial8250_handle_irq+0xa0/0xbc) r6:000000c2 r5:00000060 r4:c06790d8 r3:00000000 [] (serial8250_handle_irq+0x0/0xbc) from [] (dw8250_handle_irq+0x38/0x64) r7:00000000 r6:edd2f390 r5:000000c2 r4:c06790d8 [] (dw8250_handle_irq+0x0/0x64) from [] (serial8250_interrupt+0x44/0xc4) r6:00000000 r5:00000000 r4:c06791c4 r3:c029336c [] (serial8250_interrupt+0x0/0xc4) from [] (handle_irq_event_percpu+0xb4/0x2b0) r10:c06790d8 r9:eab27000 r8:00000000 r7:00000000 r6:0000001f r5:edd52980 r4:ec53b6c0 r3:c028d2b0 [] (handle_irq_event_percpu+0x0/0x2b0) from [] (handle_irq_event+0x4c/0x6c) r10:c06790d8 r9:eab27000 r8:c0673ae0 r7:c05c2020 r6:ec53b6c0 r5:edd529d4 r4:edd52980 [] (handle_irq_event+0x0/0x6c) from [] (handle_level_irq+0xe8/0x100) r6:00000000 r5:edd529d4 r4:edd52980 r3:00022000 [] (handle_level_irq+0x0/0x100) from [] (generic_handle_irq+0x30/0x40) r5:0000001f r4:0000001f [] (generic_handle_irq+0x0/0x40) from [] (handle_IRQ+0xd0/0x13c) r4:ea997b18 r3:000000e0 [] (handle_IRQ+0x0/0x13c) from [] (armada_370_xp_handle_irq+0x4c/0x118) r8:000003ff r7:ea997b18 r6:ffffffff r5:60070013 r4:c0674dc0 [] (armada_370_xp_handle_irq+0x0/0x118) from [] (__irq_svc+0x40/0x70) Exception stack(0xea997b18 to 0xea997b60) 7b00: 00000001 20070013 7b20: 00000000 0000000b 20070013 eab27000 20070013 00000000 ed10103e eab27000 7b40: c06790d8 ea997b74 ea997b60 ea997b60 c04186c0 c04186c8 60070013 ffffffff r9:eab27000 r8:ed10103e r7:ea997b4c r6:ffffffff r5:60070013 r4:c04186c8 [] (_raw_spin_unlock_irqrestore+0x0/0x54) from [] (uart_start+0x40/0x44) r4:c06790d8 r3:c028ddd8 [] (uart_start+0x0/0x44) from [] (uart_write+0xe4/0xf4) r6:0000003e r5:00000000 r4:ed68ea90 r3:0000003e [] (uart_write+0x0/0xf4) from [] (sl_xmit+0x1c4/0x228 [slip]) r10:ed388e60 r9:0000003c r8:ffffffdd r7:0000003e r6:ec02754c r5:ea717eb8 r4:ec027000 [] (sl_xmit+0x0/0x228 [slip]) from [] (dev_hard_start_xmit+0x39c/0x6d0) r8:eaf163c0 r7:ec027000 r6:ea717eb8 r5:00000000 r4:00000000 Signed-off-by: Tyler Hall Cc: Oliver Hartkopp Cc: Andre Naujoks Cc: David S. Miller Cc: linux-kernel@vger.kernel.org Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 66e530f3a718e896289e742fcfec57a471965227 Author: Dmitry Popov Date: Sat Jul 5 02:26:37 2014 +0400 ip_tunnel: fix ip_tunnel_lookup [ Upstream commit e0056593b61253f1a8a9941dacda22e73b963cdc ] This patch fixes 3 similar bugs where incoming packets might be routed into wrong non-wildcard tunnels: 1) Consider the following setup: ip address add 1.1.1.1/24 dev eth0 ip address add 1.1.1.2/24 dev eth0 ip tunnel add ipip1 remote 2.2.2.2 local 1.1.1.1 mode ipip dev eth0 ip link set ipip1 up Incoming ipip packets from 2.2.2.2 were routed into ipip1 even if it has dst = 1.1.1.2. Moreover even if there was wildcard tunnel like ip tunnel add ipip0 remote 2.2.2.2 local any mode ipip dev eth0 but it was created before explicit one (with local 1.1.1.1), incoming ipip packets with src = 2.2.2.2 and dst = 1.1.1.2 were still routed into ipip1. Same issue existed with all tunnels that use ip_tunnel_lookup (gre, vti) 2) ip address add 1.1.1.1/24 dev eth0 ip tunnel add ipip1 remote 2.2.146.85 local 1.1.1.1 mode ipip dev eth0 ip link set ipip1 up Incoming ipip packets with dst = 1.1.1.1 were routed into ipip1, no matter what src address is. Any remote ip address which has ip_tunnel_hash = 0 raised this issue, 2.2.146.85 is just an example, there are more than 4 million of them. And again, wildcard tunnel like ip tunnel add ipip0 remote any local 1.1.1.1 mode ipip dev eth0 wouldn't be ever matched if it was created before explicit tunnel like above. Gre & vti tunnels had the same issue. 3) ip address add 1.1.1.1/24 dev eth0 ip tunnel add gre1 remote 2.2.146.84 local 1.1.1.1 key 1 mode gre dev eth0 ip link set gre1 up Any incoming gre packet with key = 1 were routed into gre1, no matter what src/dst addresses are. Any remote ip address which has ip_tunnel_hash = 0 raised the issue, 2.2.146.84 is just an example, there are more than 4 million of them. Wildcard tunnel like ip tunnel add gre2 remote any local any key 1 mode gre dev eth0 wouldn't be ever matched if it was created before explicit tunnel like above. All this stuff happened because while looking for a wildcard tunnel we didn't check that matched tunnel is a wildcard one. Fixed. Signed-off-by: Dmitry Popov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 8d114c86d88ee7dc52760b585f4abdf358fabca1 Author: David Ertman Date: Wed Mar 5 07:50:46 2014 +0000 e1000e: Fix SHRA register access for 82579 commit 96dee024ca4799d6d21588951240035c21ba1c67 upstream. Previous commit c3a0dce35af0 fixed an overrun for the RAR on i218 devices. This commit also attempted to homogenize the RAR/SHRA access for all parts accessed by the e1000e driver. This change introduced an error for assigning MAC addresses to guest OS's for 82579 devices. Only RAR[0] is accessible to the driver for 82579 parts, and additional addresses must be placed into the SHRA[L|H] registers. The rar_entry_count was changed in the previous commit to an inaccurate value that accounted for all RAR and SHRA registers, not just the ones usable by the driver. This patch fixes the count to the correct value and adjusts the e1000_rar_set_pch2lan() function to user the correct index. Cc: John Greene Signed-off-by: Dave Ertman Tested-by: Aaron Brown Signed-off-by: Jeff Kirsher Cc: "Alexander Y. Fomichev" Signed-off-by: Greg Kroah-Hartman commit 087c99b15f48e476bcdcdef2ed4c5d145c1c88dd Author: Hugh Dickins Date: Wed Jul 23 14:00:13 2014 -0700 shmem: fix splicing from a hole while it's punched commit b1a366500bd537b50c3aad26dc7df083ec03a448 upstream. shmem_fault() is the actual culprit in trinity's hole-punch starvation, and the most significant cause of such problems: since a page faulted is one that then appears page_mapped(), needing unmap_mapping_range() and i_mmap_mutex to be unmapped again. But it is not the only way in which a page can be brought into a hole in the radix_tree while that hole is being punched; and Vlastimil's testing implies that if enough other processors are busy filling in the hole, then shmem_undo_range() can be kept from completing indefinitely. shmem_file_splice_read() is the main other user of SGP_CACHE, which can instantiate shmem pagecache pages in the read-only case (without holding i_mutex, so perhaps concurrently with a hole-punch). Probably it's silly not to use SGP_READ already (using the ZERO_PAGE for holes): which ought to be safe, but might bring surprises - not a change to be rushed. shmem_read_mapping_page_gfp() is an internal interface used by drivers/gpu/drm GEM (and next by uprobes): it should be okay. And shmem_file_read_iter() uses the SGP_DIRTY variant of SGP_CACHE, when called internally by the kernel (perhaps for a stacking filesystem, which might rely on holes to be reserved): it's unclear whether it could be provoked to keep hole-punch busy or not. We could apply the same umbrella as now used in shmem_fault() to shmem_file_splice_read() and the others; but it looks ugly, and use over a range raises questions - should it actually be per page? can these get starved themselves? The origin of this part of the problem is my v3.1 commit d0823576bf4b ("mm: pincer in truncate_inode_pages_range"), once it was duplicated into shmem.c. It seemed like a nice idea at the time, to ensure (barring RCU lookup fuzziness) that there's an instant when the entire hole is empty; but the indefinitely repeated scans to ensure that make it vulnerable. Revert that "enhancement" to hole-punch from shmem_undo_range(), but retain the unproblematic rescanning when it's truncating; add a couple of comments there. Remove the "indices[0] >= end" test: that is now handled satisfactorily by the inner loop, and mem_cgroup_uncharge_start()/end() are too light to be worth avoiding here. But if we do not always loop indefinitely, we do need to handle the case of swap swizzled back to page before shmem_free_swap() gets it: add a retry for that case, as suggested by Konstantin Khlebnikov; and for the case of page swizzled back to swap, as suggested by Johannes Weiner. Signed-off-by: Hugh Dickins Reported-by: Sasha Levin Suggested-by: Vlastimil Babka Cc: Konstantin Khlebnikov Cc: Johannes Weiner Cc: Lukas Czerner Cc: Dave Jones Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 5e9a01c58c62ec8e546253d724b4c4910eea32e0 Author: Hugh Dickins Date: Wed Jul 23 14:00:10 2014 -0700 shmem: fix faulting into a hole, not taking i_mutex commit 8e205f779d1443a94b5ae81aa359cb535dd3021e upstream. Commit f00cdc6df7d7 ("shmem: fix faulting into a hole while it's punched") was buggy: Sasha sent a lockdep report to remind us that grabbing i_mutex in the fault path is a no-no (write syscall may already hold i_mutex while faulting user buffer). We tried a completely different approach (see following patch) but that proved inadequate: good enough for a rational workload, but not good enough against trinity - which forks off so many mappings of the object that contention on i_mmap_mutex while hole-puncher holds i_mutex builds into serious starvation when concurrent faults force the puncher to fall back to single-page unmap_mapping_range() searches of the i_mmap tree. So return to the original umbrella approach, but keep away from i_mutex this time. We really don't want to bloat every shmem inode with a new mutex or completion, just to protect this unlikely case from trinity. So extend the original with wait_queue_head on stack at the hole-punch end, and wait_queue item on the stack at the fault end. This involves further use of i_lock to guard against the races: lockdep has been happy so far, and I see fs/inode.c:unlock_new_inode() holds i_lock around wake_up_bit(), which is comparable to what we do here. i_lock is more convenient, but we could switch to shmem's info->lock. This issue has been tagged with CVE-2014-4171, which will require commit f00cdc6df7d7 and this and the following patch to be backported: we suggest to 3.1+, though in fact the trinity forkbomb effect might go back as far as 2.6.16, when madvise(,,MADV_REMOVE) came in - or might not, since much has changed, with i_mmap_mutex a spinlock before 3.0. Anyone running trinity on 3.0 and earlier? I don't think we need care. Signed-off-by: Hugh Dickins Reported-by: Sasha Levin Tested-by: Sasha Levin Cc: Vlastimil Babka Cc: Konstantin Khlebnikov Cc: Johannes Weiner Cc: Lukas Czerner Cc: Dave Jones Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit dd78e88404ef8091f5f0132a53fdc084b3a6080b Author: Hugh Dickins Date: Mon Jun 23 13:22:06 2014 -0700 shmem: fix faulting into a hole while it's punched commit f00cdc6df7d7cfcabb5b740911e6788cb0802bdb upstream. Trinity finds that mmap access to a hole while it's punched from shmem can prevent the madvise(MADV_REMOVE) or fallocate(FALLOC_FL_PUNCH_HOLE) from completing, until the reader chooses to stop; with the puncher's hold on i_mutex locking out all other writers until it can complete. It appears that the tmpfs fault path is too light in comparison with its hole-punching path, lacking an i_data_sem to obstruct it; but we don't want to slow down the common case. Extend shmem_fallocate()'s existing range notification mechanism, so shmem_fault() can refrain from faulting pages into the hole while it's punched, waiting instead on i_mutex (when safe to sleep; or repeatedly faulting when not). [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Hugh Dickins Reported-by: Sasha Levin Tested-by: Sasha Levin Cc: Dave Jones Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 99a8321dc06a1b88ca1c25fef7f9908d9960736d Author: Emmanuel Grumbach Date: Wed Jun 25 09:12:30 2014 +0300 iwlwifi: dvm: don't enable CTS to self commit 43d826ca5979927131685cc2092c7ce862cb91cd upstream. We should always prefer to use full RTS protection. Using CTS to self gives a meaningless improvement, but this flow is much harder for the firmware which is likely to have issues with it. Signed-off-by: Emmanuel Grumbach Signed-off-by: Greg Kroah-Hartman commit 307c6d3e9045c7f0258e8540169c574fd46581ab Author: Oren Givon Date: Sun May 25 16:31:58 2014 +0300 iwlwifi: update the 7265 series HW IDs commit b3c063ae7279981f7161e63b44f214c62f122b32 upstream. Add one more 7265 series HW ID. Edit one existing 7265 series HW ID. Signed-off-by: Oren Givon Signed-off-by: Emmanuel Grumbach Signed-off-by: Greg Kroah-Hartman commit 7836db47d6b664c77d7ebdea2e91e776caac4c34 Author: Niu Yawei Date: Wed Jun 4 12:22:13 2014 +0800 quota: missing lock in dqcache_shrink_scan() commit d68aab6b8f572406aa93b45ef6483934dd3b54a6 upstream. Commit 1ab6c4997e04 (fs: convert fs shrinkers to new scan/count API) accidentally removed locking from quota shrinker. Fix it - dqcache_shrink_scan() should use dq_list_lock to protect the scan on free_dquots list. Fixes: 1ab6c4997e04a00c50c6d786c2f046adc0d1f5de Signed-off-by: Niu Yawei Signed-off-by: Jan Kara Signed-off-by: Greg Kroah-Hartman commit f09e424e436c56fd8c4ebd182d407851b7f42d83 Author: Stefan Assmann Date: Thu Jul 10 03:29:39 2014 -0700 igb: do a reset on SR-IOV re-init if device is down commit 76252723e88681628a3dbb9c09c963e095476f73 upstream. To properly re-initialize SR-IOV it is necessary to reset the device even if it is already down. Not doing this may result in Tx unit hangs. Signed-off-by: Stefan Assmann Tested-by: Aaron Brown Signed-off-by: Jeff Kirsher Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 081caead26b5841c597d520be8ff1e2f69b4fadc Author: Todd Fujinaka Date: Thu Jul 10 01:47:15 2014 -0700 igb: Workaround for i210 Errata 25: Slow System Clock commit 948264879b6894dc389a44b99fae4f0b72932619 upstream. On some devices, the internal PLL circuit occasionally provides the wrong clock frequency after power up. The probability of failure is less than one failure per 1000 power cycles. When the failure occurs, the internal clock frequency is around 1/20 of the correct frequency. Signed-off-by: Todd Fujinaka Tested-by: Aaron Brown Signed-off-by: Jeff Kirsher Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit ee0af1ad5166963cbe11d3cd3c58a9386a60665f Author: Guenter Roeck Date: Wed Jul 16 17:40:31 2014 -0700 hwmon: (adt7470) Fix writes to temperature limit registers commit de12d6f4b10b21854441f5242dcb29ea96181e58 upstream. Temperature limit registers are signed. Limits therefore need to be clamped to (-128, 127) degrees C and not to (0, 255) degrees C. Without this fix, writing a limit of 128 degrees C sets the actual limit to -128 degrees C. Signed-off-by: Guenter Roeck Reviewed-by: Axel Lin Signed-off-by: Greg Kroah-Hartman commit 2fd37381b1f09aa129c1a67bfe883b8f656c3ef0 Author: Axel Lin Date: Wed Jul 9 09:18:59 2014 +0800 hwmon: (da9052) Don't use dash in the name attribute commit ee14b644daaa58afe1e91bb9ebd9cf1b18d1f5fa upstream. Dashes are not allowed in hwmon name attributes. Use "da9052" instead of "da9052-hwmon". Signed-off-by: Axel Lin Signed-off-by: Guenter Roeck Signed-off-by: Greg Kroah-Hartman commit 572ebe4fadcaefa2196c17adc168d8b2e2ce274f Author: Axel Lin Date: Wed Jul 9 09:22:54 2014 +0800 hwmon: (da9055) Don't use dash in the name attribute commit 6b00f440dd678d786389a7100a2e03fe44478431 upstream. Dashes are not allowed in hwmon name attributes. Use "da9055" instead of "da9055-hwmon". Signed-off-by: Axel Lin Signed-off-by: Guenter Roeck Signed-off-by: Greg Kroah-Hartman commit 3fcfc776812efe9eec484d4798ad63bfe31fe152 Author: David Vrabel Date: Fri Jun 27 10:42:03 2014 +0100 xen/balloon: set ballooned out pages as invalid in p2m commit fb9a0c443691ceaab3daba966bbbd9f5ff3aa26f upstream. Since cd9151e26d31048b2b5e00fd02e110e07d2200c9 (xen/balloon: set a mapping for ballooned out pages), a ballooned out page had its entry in the p2m set to the MFN of one of the scratch pages. This means that the p2m will contain many entries pointing to the same MFN. During a domain save, these many-to-one entries are not identified as such and the scratch page is saved multiple times. On restore the ballooned pages are populated with new frames and the domain may use up its allocation before all pages can be restored. Since the original fix only needed to keep a mapping for the ballooned page it is safe to set ballooned out pages as INVALID_P2M_ENTRY in the p2m (as they were before). Thus preventing them from being saved and re-populated on restore. Signed-off-by: David Vrabel Reported-by: Marek Marczykowski Tested-by: Marek Marczykowski Acked-by: Stefano Stabellini Signed-off-by: Greg Kroah-Hartman commit 985615b27de9a693d8015d1bfe32d35b24133b9c Author: zhangwei(Jovi) Date: Thu Jul 18 16:31:18 2013 +0800 tracing: Add TRACE_ITER_PRINTK flag check in __trace_puts/__trace_bputs commit f0160a5a2912267c02cfe692eac955c360de5fdf upstream. The TRACE_ITER_PRINTK check in __trace_puts/__trace_bputs is missing, so add it, to be consistent with __trace_printk/__trace_bprintk. Those functions are all called by the same function: trace_printk(). Link: http://lkml.kernel.org/p/51E7A7D6.8090900@huawei.com Signed-off-by: zhangwei(Jovi) Signed-off-by: Steven Rostedt Signed-off-by: Greg Kroah-Hartman commit 0c82c4cc9a9bf733e765d4471926d5a2d93bfeee Author: zhangwei(Jovi) Date: Thu Jul 18 16:31:05 2013 +0800 tracing: Add ftrace_trace_stack into __trace_puts/__trace_bputs commit 8abfb8727f4a724d31f9ccfd8013fbd16d539445 upstream. Currently trace option stacktrace is not applicable for trace_printk with constant string argument, the reason is in __trace_puts/__trace_bputs ftrace_trace_stack is missing. In contrast, when using trace_printk with non constant string argument(will call into __trace_printk/__trace_bprintk), then trace option stacktrace is workable, this inconstant result will confuses users a lot. Link: http://lkml.kernel.org/p/51E7A7C9.9040401@huawei.com Signed-off-by: zhangwei(Jovi) Signed-off-by: Steven Rostedt Signed-off-by: Greg Kroah-Hartman commit 45ed61bc4fe2bb437f042420501e3980cf1ff3f6 Author: Steven Rostedt (Red Hat) Date: Tue Jul 15 11:05:12 2014 -0400 tracing: Fix graph tracer with stack tracer on other archs commit 5f8bf2d263a20b986225ae1ed7d6759dc4b93af9 upstream. Running my ftrace tests on PowerPC, it failed the test that checks if function_graph tracer is affected by the stack tracer. It was. Looking into this, I found that the update_function_graph_func() must be called even if the trampoline function is not changed. This is because archs like PowerPC do not support ftrace_ops being passed by assembly and instead uses a helper function (what the trampoline function points to). Since this function is not changed even when multiple ftrace_ops are added to the code, the test that falls out before calling update_function_graph_func() will miss that the update must still be done. Call update_function_graph_function() for all calls to update_ftrace_function() Signed-off-by: Steven Rostedt Signed-off-by: Greg Kroah-Hartman commit 86fb4c8cc75ef7c1c9805bedf01b6a57d3d624aa Author: Oleg Nesterov Date: Fri Jul 11 21:06:38 2014 +0200 tracing: instance_rmdir() leaks ftrace_event_file->filter commit 2448e3493cb3874baa90725c87869455ebf11cd2 upstream. instance_rmdir() path destroys the event files but forgets to free file->filter. Change remove_event_file_dir() to free_event_filter(). Link: http://lkml.kernel.org/p/20140711190638.GA19517@redhat.com Cc: Masami Hiramatsu Cc: Namhyung Kim Cc: Srikar Dronamraju Cc: Tom Zanussi Cc: "zhangwei(Jovi)" Fixes: f6a84bdc75b5 "tracing: Introduce remove_event_file_dir()" Signed-off-by: Oleg Nesterov Signed-off-by: Steven Rostedt Signed-off-by: Greg Kroah-Hartman commit 78107336ce02a2d8e53c322f795f218351514aa0 Author: Srinivas Pandruvada Date: Thu Aug 7 22:03:00 2014 +0100 iio:core: Handle error when mask type is not separate commit 78b3321610bf920d7fceb1a0236faa881be0bcf3 upstream. When event spec is shared by multiple channels, which has definition for mask_shared_by_type, iio_device_register_eventset fails. For example: static const struct iio_event_spec iio_dummy_events[] = { { .type = IIO_EV_TYPE_THRESH, .dir = IIO_EV_DIR_RISING, .mask_separate = BIT(IIO_EV_INFO_ENABLE), .mask_shared_by_type = BIT(IIO_EV_INFO_VALUE), }, { .type = IIO_EV_TYPE_THRESH, .dir = IIO_EV_DIR_FALLING, .mask_separate = BIT(IIO_EV_INFO_ENABLE),a .mask_shared_by_type = BIT(IIO_EV_INFO_VALUE), } }; If two channels use this event spec, this will result in error. This change handles EBUSY error similar to iio_device_add_info_mask_type(). Signed-off-by: Srinivas Pandruvada Signed-off-by: Jonathan Cameron Signed-off-by: Greg Kroah-Hartman commit 1dda7e9997208e5274f32ee991ca16a2cdb4a9b7 Author: Anand Avati Date: Thu Jun 26 20:21:57 2014 -0400 fuse: ignore entry-timeout on LOOKUP_REVAL commit 154210ccb3a871e631bf39fdeb7a8731d98af87b upstream. The following test case demonstrates the bug: sh# mount -t glusterfs localhost:meta-test /mnt/one sh# mount -t glusterfs localhost:meta-test /mnt/two sh# echo stuff > /mnt/one/file; rm -f /mnt/two/file; echo stuff > /mnt/one/file bash: /mnt/one/file: Stale file handle sh# echo stuff > /mnt/one/file; rm -f /mnt/two/file; sleep 1; echo stuff > /mnt/one/file On the second open() on /mnt/one, FUSE would have used the old nodeid (file handle) trying to re-open it. Gluster is returning -ESTALE. The ESTALE propagates back to namei.c:filename_lookup() where lookup is re-attempted with LOOKUP_REVAL. The right behavior now, would be for FUSE to ignore the entry-timeout and and do the up-call revalidation. Instead FUSE is ignoring LOOKUP_REVAL, succeeding the revalidation (because entry-timeout has not passed), and open() is again retried on the old file handle and finally the ESTALE is going back to the application. Fix: if revalidation is happening with LOOKUP_REVAL, then ignore entry-timeout and always do the up-call. Signed-off-by: Anand Avati Reviewed-by: Niels de Vos Signed-off-by: Miklos Szeredi Signed-off-by: Greg Kroah-Hartman commit 35320d7b32ba8cfafc5300ef98f8be4d9ba81e09 Author: Miklos Szeredi Date: Mon Jul 7 15:28:51 2014 +0200 fuse: handle large user and group ID commit 233a01fa9c4c7c41238537e8db8434667ff28a2f upstream. If the number in "user_id=N" or "group_id=N" mount options was larger than INT_MAX then fuse returned EINVAL. Fix this to handle all valid uid/gid values. Signed-off-by: Miklos Szeredi Signed-off-by: Greg Kroah-Hartman commit b6efa6c43bd6db53c7d5db71c834817a13dca059 Author: Miklos Szeredi Date: Mon Jul 7 15:28:50 2014 +0200 fuse: timeout comparison fix commit 126b9d4365b110c157bc4cbc32540dfa66c9c85a upstream. As suggested by checkpatch.pl, use time_before64() instead of direct comparison of jiffies64 values. Signed-off-by: Miklos Szeredi Signed-off-by: Greg Kroah-Hartman commit 32c24ed1f0b34c1c05df46029eccef426b20f3e5 Author: Loic Poulain Date: Mon Jun 23 17:42:44 2014 +0200 Bluetooth: Ignore H5 non-link packets in non-active state commit 48439d501e3d9e8634bdc0c418e066870039599d upstream. When detecting a non-link packet, h5_reset_rx() frees the Rx skb. Not returning after that will cause the upcoming h5_rx_payload() call to dereference a now NULL Rx skb and trigger a kernel oops. Signed-off-by: Loic Poulain Signed-off-by: Marcel Holtmann Signed-off-by: Greg Kroah-Hartman commit d887a032c8d56b4b525f698ac2354876f6a4a5b4 Author: K. Y. Srinivasan Date: Mon Jul 7 16:34:25 2014 -0700 Drivers: hv: util: Fix a bug in the KVP code commit 9bd2d0dfe4714dd5d7c09a93a5c9ea9e14ceb3fc upstream. Add code to poll the channel since we process only one message at a time and the host may not interrupt us. Also increase the receive buffer size since some KVP messages are close to 8K bytes in size. Signed-off-by: K. Y. Srinivasan Signed-off-by: Greg Kroah-Hartman commit 815ba8f81857efadce6af198b07731cebf3e6b35 Author: Mengdong Lin Date: Tue Mar 11 17:12:52 2014 -0400 ALSA: hda - initialize audio InfoFrame to be all zero commit caaf5ef9493f72390905f1e97b310b8906d32dac upstream. This patch initialized the local audio InfoFrame variable 'ai' to be all zero, thus the data bytes will indicate "Refer to Stream Header" by default. Signed-off-by: Mengdong Lin Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman commit 2e47bede83f5ec53627e0d3c8a1634441942b495 Author: Takashi Iwai Date: Tue Jul 15 15:19:43 2014 +0200 ALSA: hda - Fix broken PM due to incomplete i915 initialization commit 4da63c6fc426023d1a20e45508c47d7d68c6a53d upstream. When the initialization of Intel HDMI controller fails due to missing i915 kernel symbols (e.g. HD-audio is built in while i915 is module), the driver discontinues the probe. However, since the probe was done asynchronously, the driver object still remains, thus the relevant PM ops are still called at suspend/resume. This results in the bad access to the incomplete audio card object, eventually leads to Oops or stall at PM. This patch adds the missing checks of chip->init_failed flag at each PM callback in order to fix the problem above. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=79561 Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman commit e8e1c3d27988668296652e62de1335f3c4940f22 Author: Hans de Goede Date: Wed Jul 9 06:20:44 2014 -0300 media: gspca_pac7302: Add new usb-id for Genius i-Look 317 commit 242841d3d71191348f98310e2d2001e1001d8630 upstream. Tested-and-reported-by: yullaw Signed-off-by: Hans de Goede Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Greg Kroah-Hartman commit 3b5e2408ca94673c8b1d61d388ca67d282c12fd1 Author: Abbas Raza Date: Thu Jul 17 19:34:31 2014 +0800 usb: chipidea: udc: Disable auto ZLP generation on ep0 commit 953c66469735aed8d2ada639a72b150f01dae605 upstream. There are 2 methods for ZLP (zero-length packet) generation: 1) In software 2) Automatic generation by device controller 1) is implemented in UDC driver and it attaches ZLP to IN packet if descriptor->size < wLength 2) can be enabled/disabled by setting ZLT bit in the QH When gadget ffs is connected to ubuntu host, the host sends get descriptor request and wLength in setup packet is 255 while the size of descriptor which will be sent by gadget in IN packet is 64 byte. So the composite driver sets req->zero = 1. In UDC driver following code will be executed then if (hwreq->req.zero && hwreq->req.length && (hwreq->req.length % hwep->ep.maxpacket == 0)) add_td_to_list(hwep, hwreq, 0); Case-A: So in case of ubuntu host, UDC driver will attach a ZLP to the IN packet. ubuntu host will request 255 byte in IN request, gadget will send 64 byte with ZLP and host will come to know that there is no more data. But hold on, by default ZLT=0 for endpoint 0 so hardware also tries to automatically generate the ZLP which blocks enumeration for ~6 seconds due to endpoint 0 STALL, NAKs are sent to host for any requests (OUT/PING) Case-B: In case when gadget ffs is connected to Apple device, Apple device sends setup packet with wLength=64. So descriptor->size = 64 and wLength=64 therefore req->zero = 0 and UDC driver will not attach any ZLP to the IN packet. Apple device requests 64 bytes, gets 64 bytes and doesn't further request for IN data. But ZLT=0 by default for endpoint 0 so hardware tries to automatically generate the ZLP which blocks enumeration for ~6 seconds due to endpoint 0 STALL, NAKs are sent to host for any requests (OUT/PING) According to USB2.0 specs: 8.5.3.2 Variable-length Data Stage A control pipe may have a variable-length data phase in which the host requests more data than is contained in the specified data structure. When all of the data structure is returned to the host, the function should indicate that the Data stage is ended by returning a packet that is shorter than the MaxPacketSize for the pipe. If the data structure is an exact multiple of wMaxPacketSize for the pipe, the function will return a zero-length packet to indicate the end of the Data stage. In Case-A mentioned above: If we disable software ZLP generation & ZLT=0 for endpoint 0 OR if software ZLP generation is not disabled but we set ZLT=1 for endpoint 0 then enumeration doesn't block for 6 seconds. In Case-B mentioned above: If we disable software ZLP generation & ZLT=0 for endpoint then enumeration still blocks due to ZLP automatically generated by hardware and host not needing it. But if we keep software ZLP generation enabled but we set ZLT=1 for endpoint 0 then enumeration doesn't block for 6 seconds. So the proper solution for this issue seems to disable automatic ZLP generation by hardware (i.e by setting ZLT=1 for endpoint 0) and let software (UDC driver) handle the ZLP generation based on req->zero field. Signed-off-by: Abbas Raza Signed-off-by: Peter Chen Signed-off-by: Greg Kroah-Hartman commit e354b2e1f10aceec9ab4a3dd0e1078ada894f45b Author: Gavin Guo Date: Fri Jul 18 01:12:13 2014 +0800 usb: Check if port status is equal to RxDetect commit bb86cf569bbd7ad4dce581a37c7fbd748057e9dc upstream. When using USB 3.0 pen drive with the [AMD] FCH USB XHCI Controller [1022:7814], the second hotplugging will experience the USB 3.0 pen drive is recognized as high-speed device. After bisecting the kernel, I found the commit number 41e7e056cdc662f704fa9262e5c6e213b4ab45dd (USB: Allow USB 3.0 ports to be disabled.) causes the bug. After doing some experiments, the bug can be fixed by avoiding executing the function hub_usb3_port_disable(). Because the port status with [AMD] FCH USB XHCI Controlleris [1022:7814] is already in RxDetect (I tried printing out the port status before setting to Disabled state), it's reasonable to check the port status before really executing hub_usb3_port_disable(). Fixes: 41e7e056cdc6 (USB: Allow USB 3.0 ports to be disabled.) Signed-off-by: Gavin Guo Acked-by: Alan Stern Signed-off-by: Greg Kroah-Hartman