| --- | Log | opened Mon Jan 22 00:00:48 2007 |
| 00:53 | |-| | ram [~ram@pool-71-245-96-74.nycmny.fios.verizon.net] has quit [Ping timeout: 480 seconds] |
| 00:59 | |-| | HuK0B [~HuK0B@89.190.202.6] has joined #uml |
| 01:14 | |-| | motp [~motp@83.236.181.13] has joined #uml |
| 01:20 | <HuK0B> | hmm.. I have strange problem already to 2 uml servers. Sometimes I don't know my fs is broken or something but when I write to some place it give me kernel panic and uml stop |
| 04:32 | |-| | motp [~motp@83.236.181.13] has quit [Quit: Leaving] |
| 08:37 | <dgraves> | morning. |
| 11:02 | |-| | hfb [~hfb@pool-71-160-242-6.lsanca.dsl-w.verizon.net] has joined #uml |
| 11:40 | |-| | ram [~ram@pool-71-245-96-74.nycmny.fios.verizon.net] has joined #uml |
| 12:09 | |-| | hfb [~hfb@pool-71-160-242-6.lsanca.dsl-w.verizon.net] has left #uml [Leaving] |
| 12:10 | <HuK0B> | hmm i got some strange errors |
| 12:10 | <HuK0B> | end_request: I/O error, dev ubdc, sector 27633040 |
| 12:10 | <HuK0B> | do_io - write failed err = 28 fd = 22 |
| 12:10 | <HuK0B> | end_request: I/O error, dev ubdc, sector 27633048 |
| 12:10 | <HuK0B> | do_io - write failed err = 28 fd = 22 |
| 12:10 | <HuK0B> | Kernel panic - not syncing: switch_mm_skas - PTRACE_SWITCH_MM failed, errno = 3 |
| 12:10 | <HuK0B> | any ideas? |
| 12:23 | <dgraves> | HuK0B: is your ubd backing file sparse? |
| 12:24 | <HuK0B> | ? didn't understand you what you mean |
| 12:24 | <HuK0B> | today is some kind of bad day almost 50% of my umls just stoped |
| 12:24 | <HuK0B> | and they got kernel panic when somebody write to some dir |
| 12:24 | <HuK0B> | I checked fss many times |
| 12:26 | <dgraves> | HuK0B: the file you have mounted on ubdc. does its ls -s size match its ls -l size? |
| 12:28 | <HuK0B> | no don't match |
| 12:30 | <HuK0B> | Buffer I/O error on device ubda, logical block 2590062 |
| 12:30 | <HuK0B> | lost page write due to I/O error on ubda |
| 12:30 | <HuK0B> | do_io - write failed err = 28 fd = 11 |
| 12:30 | <HuK0B> | end_request: I/O error, dev ubda, sector 20720504 |
| 12:30 | <HuK0B> | Buffer I/O error on device ubda, logical block 2590063 |
| 12:30 | <HuK0B> | lost page write due to I/O error on ubda |
| 12:30 | <HuK0B> | do_io - write failed err = 28 fd = 11 |
| 12:30 | <HuK0B> | strange errors I got this errors |
| 12:30 | <HuK0B> | and after some time |
| 12:30 | <HuK0B> | EIP: 0073:[<400ed5e8>] CPU: 0 Not tainted ESP: 007b:bfb5b1c0 EFLAGS: 00000212 |
| 12:30 | <HuK0B> | Not tainted |
| 12:30 | <HuK0B> | EAX: ffffffda EBX: 00000011 ECX: bfb5b660 EDX: 00000006 |
| 12:30 | <HuK0B> | ESI: 00000006 EDI: 4014df40 EBP: bfb5b1d8 DS: 007b ES: 007b |
| 12:30 | <HuK0B> | 083b77d0: [<0807273c>]end_request: I/O error, dev ubda, sector 20720520 |
| 12:30 | <HuK0B> | Buffer I/O error on device ubda, logical block 2590065 |
| 12:30 | <HuK0B> | lost page write due to I/O error on ubda |
| 12:30 | <HuK0B> | show_regs+0xb4/0xb6do_io - write failed err = 28 fd = 11 |
| 12:30 | <HuK0B> | .... |
| 12:30 | <HuK0B> | 083b77fc: [<0805fe85>] panic_exit+0x25/0x3f |
| 12:30 | <HuK0B> | 083b780c: [<08086603>] notifier_call_chain+0x1c/0x3c |
| 12:30 | <HuK0B> | 083b782c: [<08086699>] atomic_notifier_call_chain+0x11/0x16 |
| 12:30 | <HuK0B> | 083b7840: [<0807a2ce>] panic+0x4b/0xd8 |
| 12:30 | <HuK0B> | ... |
| 12:31 | <dgraves> | HuK0B: is your root filesystem on your host (or whatever filesystem you have these created on) full? |
| 12:31 | <HuK0B> | and uml stop |
| 12:31 | <HuK0B> | no there are enought space left |
| 12:34 | <HuK0B> | hmm strange |
| 12:34 | <HuK0B> | it is full but Avail 0 Used 283G Size 294G |
| 12:34 | <HuK0B> | and when I remove something it is full again |
| 12:34 | <HuK0B> | why? |
| 12:46 | <HuK0B> | ok right tnx for help |
| 12:46 | |-| | jdike [~jdike@pool-71-174-247-179.bstnma.fios.verizon.net] has joined #uml |
| 12:46 | <jdike> | Hi guys |
| 13:13 | |-| | kos_tom [~thomas@humanoidz.org] has joined #uml |
| 13:16 | |-| | kokoko1 [~Slacker@203.148.65.8] has joined #uml |
| 13:16 | <kokoko1> | hiya |
| 13:18 | <dgraves> | jdike: hey, the raw stuff worked. |
| 13:18 | <dgraves> | HuK0B: sorry, i had to step out for a bit. |
| 13:18 | <dgraves> | HuK0B: the problem is probably that your backing files are sparse. how did you create them? |
| 13:19 | <jdike> | cool |
| 13:19 | <jdike> | I expected it would |
| 13:19 | <jdike> | kokoko1, have you seen whether your UMLs are dropping core files yet? |
| 13:20 | |-| | kos_tom [~thomas@humanoidz.org] has quit [Quit: I like core dumps] |
| 13:21 | |-| | kos_tom [~thomas@humanoidz.org] has joined #uml |
| 13:22 | <dgraves> | jdike: thanks. we had to enable RAW DEV AND MAX_RAW_DEVICES but it works as expected. :) |
| 13:23 | <jdike> | right |
| 13:23 | <jdike> | do you have a patch I can forward to mainline? |
| 13:23 | <dgraves> | ::LOL:: |
| 13:23 | <dgraves> | nope. |
| 13:24 | <dgraves> | developer had changed so much else it wasn't funny. |
| 13:24 | <dgraves> | i'll see if i can whip one up for you. |
| 13:25 | |-| | richardw [~richardw@M260P009.adsl.highway.telekom.at] has joined #uml |
| 13:27 | <jdike> | hehe |
| 13:27 | <jdike> | what's there to change? |
| 13:27 | <jdike> | dump a couple of config declarations in Kconfig.char or whatever and away you go |
| 13:27 | <dgraves> | right. exactly. |
| 13:28 | <dgraves> | in fact, that's all we did. |
| 13:28 | <dgraves> | however, the developer had prechanged a lot of things. |
| 13:28 | <dgraves> | so his tree wasn't good for a patch baseline. |
| 13:28 | <dgraves> | and he didn't have quilt setup. ;) |
| 13:28 | <jdike> | OK |
| 13:33 | <dgraves> | jdike: what's the patch command line you like me to use? |
| 13:33 | <dgraves> | diff, i mean. |
| 13:36 | <jdike> | diff -Nur |
| 13:36 | <jdike> | at the root of the kernel tree |
| 13:37 | <kokoko1> | jdike, sorry i was away |
| 13:37 | <kokoko1> | jdike, nope its not dropping core files :( |
| 13:37 | <jdike> | and it's still dying |
| 13:37 | <kokoko1> | yes :( |
| 13:38 | <jdike> | well, I'd at least like the exit status from the UML |
| 13:38 | <jdike> | that will tell me something |
| 13:38 | <kokoko1> | dgraves, howdy |
| 13:39 | <dgraves> | kokoko1: heya. |
| 13:40 | <dgraves> | jdike: email to? |
| 13:40 | <dgraves> | sorry, lost my address book. |
| 13:40 | <kokoko1> | allmy observation is , this uml keep dying after we start using SA (spamd) on our mail server |
| 13:42 | <dgraves> | jdike: its in the mail. |
| 13:42 | <dgraves> | hope i did it right. |
| 13:42 | <dgraves> | i need to set up quilt again, lost it on my box. |
| 13:42 | <dgraves> | lost my box too. :) |
| 14:07 | |-| | tyler [~tyler@89.98.144.15] has joined #uml |
| 14:18 | |-| | HuK0B [~HuK0B@89.190.202.6] has quit [Ping timeout: 480 seconds] |
| 14:18 | <kokoko1> | heh, dgraves you lost it ? :P |
| 14:26 | <jdike> | kokoko1, can you boot a test UML, send it a SIGABRT and see if it dumps core? |
| 14:26 | <jdike> | dgraves, tx |
| 14:28 | <dgraves> | kokoko1: yeah. :) it died in a gentoo update. |
| 14:28 | <dgraves> | so i went to kubuntu. |
| 14:28 | <kokoko1> | jdike, sure i'll let you know atm doing some important work |
| 14:28 | <jdike> | OK |
| 14:28 | [~] | kokoko1 rebooting xen hosts into new kernel-xen :S |
| 14:29 | <kokoko1> | heh i am kinda nerves when doing these remote reboots |
| 14:29 | <kokoko1> | okay here one host come back :D |
| 14:30 | <kokoko1> | jdike, i am tird of FC :( |
| 14:30 | <jdike> | why? |
| 14:31 | <kokoko1> | lot of time spend on updating the machines and vms |
| 14:31 | <kokoko1> | lot of updates each day :( |
| 14:31 | <kokoko1> | now see this 2.6.19-1.2895.fc6xen |
| 14:31 | <jdike> | Oh |
| 14:31 | <jdike> | there are lots of updates, but they don't take me a lot of time |
| 14:31 | <jdike> | just hit 'y' a few times and in they come |
| 14:31 | <kokoko1> | yep same here |
| 14:32 | <kokoko1> | but machines are 30+ |
| 14:32 | <jdike> | yeah |
| 14:32 | <kokoko1> | imean vms + hosts |
| 14:32 | <jdike> | I have 6-7 |
| 14:33 | <kokoko1> | i tried to convice boss to switch to centos but he didn't agreed :S |
| 14:33 | <kokoko1> | fedora is not for production IMO |
| 14:33 | <kokoko1> | even fc ppl says we can't recommend FC for production use |
| 14:34 | <jdike> | what is, then? |
| 14:34 | <jdike> | spending money on RHEL? |
| 14:35 | <jdike> | I suppose you can dl it for free too |
| 14:36 | <kokoko1> | RHEL is only valid for 30 days, after taht you will not get updates :S |
| 14:36 | <kokoko1> | centos == RHEL |
| 14:36 | <kokoko1> | jdike, interesting |
| 14:37 | <jdike> | OK, centos is RHEL with > 1 month of updates? |
| 14:37 | <jdike> | what's interesting? |
| 14:37 | <kokoko1> | i just reboot one of xen host, and host uptime and its vms uptime is different |
| 14:37 | <kokoko1> | look like xen save running vm stat to file |
| 14:37 | <kokoko1> | and start it from right there |
| 14:51 | <kokoko1> | jdike, you were on vacations ? |
| 14:51 | <jdike> | not really |
| 14:51 | <jdike> | LCA in Sydney |
| 14:51 | <kokoko1> | ah right , that's why i didn't see you in the # ;) |
| 14:51 | <kokoko1> | LCA = ? |
| 14:52 | <jdike> | yup |
| 14:53 | <jdike> | linux.conf.au |
| 14:53 | <kokoko1> | Oh right :) |
| 14:56 | |-| | Coder7 [~bhook@164.113.205.197] has joined #uml |
| 15:01 | <kokoko1> | so it was fun there? |
| 15:01 | <jdike> | yup |
| 15:01 | <kokoko1> | nice |
| 15:04 | <Coder7> | any clue why I'm getting this error when I use tunctl? TUNSETIFF: Operation not permitted |
| 15:06 | <jdike> | uml_net not suid root? |
| 15:06 | <Coder7> | hrm, let me check |
| 15:07 | <Coder7> | it is |
| 15:07 | <Coder7> | I can't figure it out, it works on one machine, and not another |
| 15:08 | <jdike> | anything in the host's dmesg? |
| 15:08 | <Coder7> | the only major difference is that one machine is running slackware 10.2, the other 11 |
| 15:08 | <jdike> | whoops, uml_net doesn't matter |
| 15:08 | <jdike> | I was thinking you were seeing that in UML, missed the tunctl bit |
| 15:09 | <jdike> | are you running tunctl as root? |
| 15:09 | <Coder7> | no, not as root |
| 15:09 | <Coder7> | I did just find an error is syslog |
| 15:11 | <Coder7> | eh, but that error is not being generated by the command |
| 15:11 | <jdike> | You have to be privileged in order to change network interfaces |
| 15:11 | <Coder7> | I'm running it as a member of the uml group, and I have /dev/net/tun set to root:uml 660 |
| 15:12 | <Coder7> | it's working on one machine, and I don't recall doing anything other than changing the permissions |
| 15:12 | <jdike> | OK, I guess works |
| 15:12 | <jdike> | +that |
| 15:12 | <Coder7> | it does allow me to delete tap devices as a regular user, but not add them |
| 15:12 | <jdike> | can you just try as root to see what that does |
| 15:12 | <jdike> | what's the command line? |
| 15:13 | <Coder7> | it does work as root |
| 15:13 | <Coder7> | tunctl -b -u $USER |
| 15:13 | <Coder7> | and tunctl -d tap0 works as a regular user |
| 15:14 | <jdike> | that suggests a permission problem then |
| 15:14 | <Coder7> | right, but I've checked and double checked them |
| 15:14 | <jdike> | maybe the TUN/TAP driver changed how it deals with privileges |
| 15:14 | <Coder7> | eh, I am using different kernel versions |
| 15:15 | <jdike> | not just the permissions on the file, but how they are handled in the driver |
| 15:15 | <jdike> | is the new one or the old one giving problems? |
| 15:15 | <Coder7> | new one is giving problems |
| 15:16 | <Coder7> | 2.6.19 |
| 15:17 | |-| | tyler [~tyler@89.98.144.15] has quit [Ping timeout: 480 seconds] |
| 15:17 | <Coder7> | reading docs now |
| 15:17 | <jdike> | The permission check is this |
| 15:17 | <jdike> | if (tun->owner != -1 && |
| 15:17 | <jdike> | current->euid != tun->owner && !capable(CAP_NET_ADMIN)) |
| 15:17 | <jdike> | return -EPERM; |
| 15:18 | <Coder7> | has CAP_NET_ADMIN been there all along, or is that new? |
| 15:18 | <jdike> | So you have to be whoever owns the device, or root |
| 15:18 | <jdike> | CAP_NET_ADMIN basically means root on normal systems |
| 15:18 | |-| | tommie [~tommie@62.235.155.142] has joined #uml |
| 15:19 | <Coder7> | but I can run tunctl to add taps on the other server, as a normal user |
| 15:21 | <Coder7> | yup, they changed the tun module |
| 15:21 | <Coder7> | I pulled up the docs on the old server, running 2.6.17 |
| 15:22 | <Coder7> | I'll just have to change things to use sudo |
| 15:22 | <jdike> | I guess they tightened up the permission checking |
| 15:25 | <Coder7> | yeah, the old docs said to only let root add devices |
| 15:25 | <Coder7> | but it allowed non-root to do it |
| 15:25 | <Coder7> | they changed it so you have to be root |
| 15:25 | <jdike> | it looks like once you assign a device to a user, that user can fiddle it |
| 15:27 | <Coder7> | correct |
| 15:27 | <Coder7> | it was stumping me though... couldn't figure out why I could delete but not add |
| 15:27 | <Coder7> | generally you get all or nothing |
| 15:28 | <Coder7> | thanks for helping |
| 15:28 | |-| | richardw_ [~richardw@M214P018.adsl.highway.telekom.at] has joined #uml |
| 15:30 | |-| | richardw [~richardw@M260P009.adsl.highway.telekom.at] has quit [Read error: Connection reset by peer] |
| 17:24 | |-| | kos_tom [~thomas@humanoidz.org] has quit [Quit: I like core dumps] |
| 17:58 | |-| | richardw_ [~richardw@M214P018.adsl.highway.telekom.at] has quit [Quit: Leaving] |
| 18:21 | |-| | Electric1lf [~dbharris@bas14-toronto12-1167996467.dsl.bell.ca] has quit [Ping timeout: 480 seconds] |
| 18:21 | |-| | ElectricElf [~dbharris@electricelf.netrep.oftc.net] has joined #uml |
| 21:07 | |-| | jdike [~jdike@pool-71-174-247-179.bstnma.fios.verizon.net] has quit [Quit: Leaving] |
| 22:35 | |-| | Nem^1 [~Nem@dslb-084-056-249-057.pools.arcor-ip.net] has joined #uml |
| 22:43 | |-| | Nem^ [~Nem@dslb-084-056-224-204.pools.arcor-ip.net] has quit [Ping timeout: 480 seconds] |
| 22:43 | |-| | Nem^1 changed nick to Nem^ |
| 22:58 | |-| | VS_ChanLog [~stats@ns.theshore.net] has left #uml [Rotating Logs] |
| 22:58 | |-| | VS_ChanLog [~stats@ns.theshore.net] has joined #uml |
| --- | Log | closed Tue Jan 23 00:00:29 2007 |