Ticket #2200 (closed defect: duplicate)

Opened 7 years ago

Last modified 5 years ago

S2875 k8temp reports different values from BIOS and w83627hf

Reported by: tuskentower@… Owned by: somebody
Priority: minor Milestone:
Component: hardware Version:
Keywords: S2875 k8temp Cc:

Description

I just rebuilt my kernel to try something out, and I enabled the k8temp kernel module. When I ran sensors I saw two new values corresponding to the two Opterons on the motherboard. These values conflict by ~10 degrees C with the w83627hf and BIOS (I have confirmed that they match).

I am using the sensors.conf that Tyan hosts  somewhere on their website. I changed the config file slightly to get the correct temp values and fan speeds. I am attaching this file and lspci output.

Motherboard: Tyan S2875
Distro: Debian Etch testing with a custom Linux 2.6.21-rc2 kernel
lm-sensors: 2.10.1-3

I just started looking at this 20 minutes ago and I saw another ticket, #2152, that looks similar except that it applies to a different board. If the problem is the same, we can close this one.

Attachments

sensors Download (1.3 KB) - added by ticket 7 years ago.
lm-sensors output
lspci Download (3.0 KB) - added by ticket 7 years ago.
lspci output
lspci-hexdump-opteron-host-bridge Download (348 bytes) - added by ticket 7 years ago.
hex dump the AMD hostbridge
cpuinfo Download (1.3 KB) - added by ticket 7 years ago.
/proc/cpuinfo
k8temp_diodeoffset.patch Download (0.8 KB) - added by ticket 7 years ago.
patch updated for style
s2875-changed-sensors.conf Download (2.1 KB) - added by ticket 7 years ago.
config file updated for +7 degrees C offset

Change History

Changed 7 years ago by ticket

lm-sensors output

Changed 7 years ago by ticket

lspci output

Changed 7 years ago by ticket

hex dump the AMD hostbridge

Changed 7 years ago by ticket

/proc/cpuinfo

in reply to: ↑ description   Changed 7 years ago by ticket

Adding sensors input inline:

k8temp-pci-00c3
Adapter: PCI adapter
temp1: +37°C

k8temp-pci-00cb
Adapter: PCI adapter
temp1: +31°C

w83627hf-isa-0290
Adapter: ISA adapter
...
CPU1 Temp: +48.0°C (high = +80°C, hyst = +75°C) sensor = diode
CPU2 Temp: +43.0°C (high = +80°C, hyst = +75°C) sensor = diode
...

CPU 1 runs hotter than CPU2 for no apparent reason (aside from me screwing up while mounting the heatsink). The values reported by the w83627hf chipset have been confirmed by the values that I have seen in the BIOS.

  Changed 7 years ago by ticket

I read the  k8temp kernel doc which pointed me to AMDs  datasheet. From there I read section 4.6.23 "Thermtrip Status Register" with the following excerpt:

Diode Offset (DiodeOffset?|5:0|)—Bits 13–8. Thermal diode offset is used to correct the measurement made by an external temperature sensor. This diode offset supports temperature sensors using two sourcing currents only. Other sourcing current implementations are not compatible with the diode offset and are not supported by AMD. The allowable offset range is provided in the appropriate processor functional data sheet, and the maximum offset can vary for different processors. A correction to the offset may be needed for some temperature sensors. Contact the temperature sensor vendor to determine whether an offset correction is needed.

To me that paragraph means that I should read the datasheet for my processors, 2Ghz 246HE (stepping CG I believe). That last line confuses me though. I'm guessing that my confusion is the result of globbing the diode and reading the diode together. Does that paragraph mean that the sensor reading the diode has its own inaccuracies along with the inaccuracy of the diode (which I need to read from the processor datasheet)?

  Changed 7 years ago by ticket

 OSK246CMP5AU with a thermal resistance (case to ambient) of 0.50 C/W (pulled from  30417.pdf may/2006). I'm not sure what those numbers mean, but they seem important. The Tcase max values are given, but that is really the case temperature and has nothing to do with the diode. Any idea where else I should look?


I read the driver code and more of the tech spec. The driver is not pulling the diode offset information. Does anyone know if pulling the diode offset will help generate better values?

It also looks like the temp measurement might be different with the revision G processors, but the  spec sheet didn't make sense to me in the first pass.

  Changed 7 years ago by ticket

This is the patch that I created to utilize the temperature sensor diode offset in the k8temp.

diff -uprN kernel.orig/drivers/hwmon/k8temp.c kernel/drivers/hwmon/k8temp.c
--- kernel.orig/drivers/hwmon/k8temp.c  2007-04-12 23:15:02.000000000 -0400
+++ kernel/drivers/hwmon/k8temp.c       2007-04-12 23:13:53.000000000 -0400
@@ -33,6 +33,7 @@
 #include <linux/mutex.h>
 
 #define TEMP_FROM_REG(val)     (((((val) >> 16) & 0xff) - 49) * 1000)
+#define OFFSET_FROM_REG(val)   ((val >> 8) & 0x3f)?(11 - ((val >> 8) & 0x3f)):0
 #define REG_TEMP       0xe4
 #define SEL_PLACE      0x40
 #define SEL_CORE       0x04
@@ -117,7 +118,8 @@ static ssize_t show_temp(struct device *
        struct k8temp_data *data = k8temp_update_device(dev);
 
        return sprintf(buf, "%d\n",
-                      TEMP_FROM_REG(data->temp[core][place]));
+                       ((int) TEMP_FROM_REG(data->temp[core][place])) +
+                       (((int) OFFSET_FROM_REG(data->temp[core][place]))*1000));
 }
 
 /* core, place */

After I patched the k8temp driver, the difference between k8temp reported temperatures and BIOS reported temps are now 6 degrees C off. The upshot is that the difference is uniform instead of randomly different. Now I have to figure out how to us the config file (aka rtfm).

Changed 7 years ago by ticket

patch updated for style

Changed 7 years ago by ticket

config file updated for +7 degrees C offset

  Changed 7 years ago by ticket

After updating the k8temp module to use the Diode Offset, I found that the temps reported were off by 6 to 7 degrees C. I updated my sensors.conf file to fix this discrepancy and now the numbers look about the same

k8temp-pci-00c3
Adapter: PCI adapter
CPU1 Temp: +47°C

k8temp-pci-00cb
Adapter: PCI adapter
CPU2 Temp: +43°C

w83627hf-isa-0290
Adapter: ISA adapter
+1.8V: +1.84 V (min = +1.71 V, max = +1.89 V)
CPU VRM: +1.31 V (min = +1.23 V, max = +1.36 V)
+3.3V: +3.25 V (min = +3.14 V, max = +3.47 V)
in3: +2.98 V (min = +2.74 V, max = +3.89 V)
5VSB: +5.26 V (min = +4.50 V, max = +5.50 V)
-12V: -12.41 V (min = -12.48 V, max = -12.27 V) ALARM
HT Volt: +1.23 V (min = +1.26 V, max = +1.14 V) ALARM
in7: +3.20 V (min = +3.87 V, max = +3.71 V) ALARM
VBat: +3.14 V (min = +2.40 V, max = +3.60 V)
CPU2 Fan: 168750 RPM (min = 0 RPM, div = 8)
CPU1 Fan: 1917 RPM (min = 0 RPM, div = 8)
SYS Fan: 0 RPM (min = 2657 RPM, div = 2) ALARM
Mobo Temp: +63°C (high = +99°C, hyst = +44°C) sensor = thermistor
CPU1 Temp: +46.5°C (high = +80°C, hyst = +75°C) sensor = diode
CPU2 Temp: +42.5°C (high = +80°C, hyst = +75°C) sensor = diode
vid: +1.300 V (VRM Version 2.4)
alarms:
beep_enable:

Sound alarm enabled

follow-up: ↓ 7   Changed 7 years ago by khali

  • cc tuskentower@… removed
  • keywords w83627hf removed
  • reporter changed from ticket to tuskentower@…

Please attach the output of:

lspci -xxx -s 00:18.3
lspci -xxx -s 00:19.3

but as root this time, so that we see all the registers.

in reply to: ↑ 6   Changed 7 years ago by ticket

sudo lspci -xxx -s 00:18.3
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
00: 22 10 03 11 00 00 00 00 00 00 00 06 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
40: ff 3b 00 00 40 00 c0 00 00 00 00 00 00 00 00 00
50: e0 c3 8e bc 00 00 00 00 00 00 00 00 80 af 31 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 11 01 02 51 11 80 00 50 00 38 00 08 1b 22 00 00
80: 00 00 07 23 13 21 13 00 00 00 00 00 00 00 00 00
90: 05 00 00 00 70 00 00 00 00 c0 bd 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 3d 00 00 80 fb 80 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 07 07 e2 04 10 27 00 20 00 25 00 00
e0: 00 00 00 00 20 07 59 00 1b 01 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

sudo  lspci -xxx -s 00:19.3
00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
00: 22 10 03 11 00 00 00 00 00 00 00 06 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
40: ff 3b 00 00 40 00 40 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 c7 ff be
60: 77 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 11 01 02 51 11 80 00 50 00 38 00 08 1b 22 00 00
80: 00 00 07 23 13 21 13 00 00 00 00 00 00 00 00 00
90: 05 00 00 00 70 00 00 00 00 c0 bd 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 1f 00 00 40 a2 b0 b9 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 07 07 e2 04 10 27 00 20 00 25 00 00
e0: 00 00 00 00 20 04 53 00 1b 01 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

  Changed 7 years ago by ticket

I have a AMD 64 X2 3600+ and I've had a similar experience. I also looked the AMD spec and found the interesting values about diode offset.

I my case the diode offset is always 21C. I added code to driver to dump out the raw values of the the word in question.

One set of values (4 numbers 2 CPU X 2 sensors) were:

3ba03a 30e07a 31e03e 2c207e

As a sanity check these have the correct bits turned on the select core 0/1 and sensor 0/1 I wrote a small C prog to format the values as per the data sheet

#include <stdio.h> #include <stdlib.h>

int main (int argc, char * argv[]) {

int i; long val; long diode_offset; long curr_temp; long tj_offset;

float f_diode_offset; float f_curr_temp; float f_t_control;

for (i=1; i<argc; ++i)

{

val = strtol(argv[i], NULL, 16); printf("Arg[%d] = %X\n", i, val);

diode_offset = 0x3F & (val >> 8); curr_temp = 0x3FF & (val >> 14); tj_offset = 0x1F & (val >> 24);

printf("raw:diode_offset = %d\n", diode_offset); printf("raw:curr_temp = %d\n", curr_temp); printf("raw:tj_offest = %d\n", tj_offset);

f_diode_offset = (float)diode_offset - 11; f_curr_temp = (float)curr_temp/4 - 49; f_t_control = f_curr_temp - (float)tj_offset*2 - 49;

printf("diode_offset = %f\n", f_diode_offset); printf("curr_temp = %f\n", f_curr_temp); printf("f_t_control = %f\n", f_t_control);

printf("Guess = %f\n", f_diode_offset + f_curr_temp);

printf("===============\n\n\n");

}

}

  Changed 7 years ago by ticket

This is graeme, to continue the above example. I ran a CPU load via teaskset to drive one core HOT, and took more reading. Formatting this vai the above utility reveals:

graeme@mediabox:~/src$ ./testamd 40A03A 37A07A 38603E 32E07E Arg[1] = 40A03A raw:diode_offset = 32 raw:curr_temp = 258 raw:tj_offest = 0 diode_offset = 21.000000 curr_temp = 15.500000 f_t_control = -33.500000 Guess = 36.500000 ===============

Arg[2] = 37A07A raw:diode_offset = 32 raw:curr_temp = 222 raw:tj_offest = 0 diode_offset = 21.000000 curr_temp = 6.500000 f_t_control = -42.500000 Guess = 27.500000 ===============

Arg[3] = 38603E raw:diode_offset = 32 raw:curr_temp = 225 raw:tj_offest = 0 diode_offset = 21.000000 curr_temp = 7.250000 f_t_control = -41.750000 Guess = 28.250000 ===============

Arg[4] = 32E07E raw:diode_offset = 32 raw:curr_temp = 203 raw:tj_offest = 0 diode_offset = 21.000000 curr_temp = 1.750000 f_t_control = -47.250000 Guess = 22.750000 ===============

The output from sensors is:

root@mediabox:~# sensors k8temp-pci-00c3 Adapter: PCI adapter temp1: +11°C temp2: +2°C temp3: +1°C temp4: +1073738°C

it8716-isa-0290 Adapter: ISA adapter VCore: +1.41 V (min = +0.00 V, max = +5.55 V) +3.3V: +3.28 V (min = +3.13 V, max = +3.47 V) +5V: +5.05 V (min = +4.74 V, max = +5.25 V) +12V: +12.06 V (min = +0.00 V, max = +17.18 V) 5VSB: +5.16 V (min = +4.75 V, max = +5.25 V) VBat: +2.91 V CPU Fan: 1642 RPM (min = 399 RPM) Case Fan1: 0 RPM (min = 1997 RPM) ALARM Case Fan2: 0 RPM (min = 0 RPM) CPU Temp: +21°C (low = +10°C, high = +60°C) sensor = diode M/B Temp: +32°C (low = +10°C, high = +50°C) sensor = thermistor vid: +0.000 V

root@mediabox:~# cat /etc/sensors.conf

...elided ...

# ########################################################################## #### Here begins the real configuration file

# These values were found, here: # #  http://www.abclinuxu.cz/forum/show/145943 # # These were a match for: M2NPV-VM and it8716-isa-0290 # This I think is therefore a ASUS m2npv-vm motherboard with lm-sensors # # I can't read the text, but the 'before case' is a very good match for # what I got: # # Before: # #k8temp-pci-00c3 #Adapter: PCI adapter #Core0 Temp: # +11°C #Core0 Temp: # +1°C #Core1 Temp: # +2°C #Core1 Temp: # -4°C # #it8716-isa-0290 #Adapter: ISA adapter #VCore: +1.04 V (min = +0.00 V, max = +4.08 V) #VDDR: +3.14 V (min = +0.00 V, max = +4.08 V) #+3.3V: +0.00 V (min = +0.00 V, max = +4.08 V) ALARM #+5V: +4.78 V (min = +0.00 V, max = +6.85 V) #+12V: +11.46 V (min = +0.00 V, max = +16.32 V) #-12V: -16.97 V (min = -16.97 V, max = +4.01 V) ALARM #-5V: -8.78 V (min = -8.78 V, max = +4.05 V) ALARM #5VSB: +4.73 V (min = +0.00 V, max = +6.85 V) #VBat: +2.91 V #fan1: 3096 RPM (min = 0 RPM) #fan2: 0 RPM (min = 0 RPM) #fan3: 0 RPM (min = 0 RPM) #temp1: +22°C (low = -1°C, high = +127°C) sensor = diode #temp2: +32°C (low = -1°C, high = +127°C) sensor = thermistor #temp3: +25°C (low = -1°C, high = +127°C) sensor = thermistor #vid: +0.000 V # # # Note the K8temp section is not affected by this. # #

chip "it8716-*"

ignore in2 ignore in5 ignore in6

label in0 "VCore" label in1 "+3.3V" label in3 "+5V" # VCC label in4 "+12V" label in7 "5VSB" # VCCH label in8 "VBat"

compute in0 @*1.36 , @/1.36 compute in1 @*1.047 , @/1.047 compute in3 @*1.773 , @/1.773 compute in4 @*4.21 , @/4.21 compute in7 @*1.833 , @/1.833

set in1_min 3.3 * 0.95 set in1_max 3.3 * 1.05 set in3_min 5 * 0.95 set in3_max 5 * 1.05 set in6_max -5 * 0.95 set in6_min -5 * 1.05 set in7_min 5 * 0.95 set in7_max 5 * 1.05

label temp1 "CPU Temp" label temp2 "M/B Temp" ignore temp3

set temp1_over 60 set temp1_low 10 set temp2_over 50 set temp2_low 10

label fan1 "CPU Fan" label fan2 "Case Fan1" label fan3 "Case Fan2" # ignore fan3

set fan1_min 400 set fan2_min 2000

# # Reading  http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/32559.pdf # # My guess is 000H = -49.00C # 001H = -48.75C # 002H = -48.50C # # So I think we want to do: # <raw value>/4 + 49 # However this is already done in the driver (I read teh source) so # it should not be needed. Anyhow the values appear to be rubbish #

#chip "k8temp-*"

label temp1 "Core0 Temp" label temp2 "Core0 Temp" label temp3 "Core1 Temp" label temp4 "Core1 Temp"

#ompute temp1 @+21 , @-21 #ompute temp2 @+21 , @-21 #ompute temp3 @+21 , @-21 #ompute temp4 @+21 , @-21

FYI, the BIOS reports things like CPU=32C MB = 34C (of course not at this exact time and the CPU varies very quickly)

  Changed 5 years ago by khali

  • status changed from new to closed
  • resolution set to duplicate

This is indeed a duplicate of #2152, #2278 and many other tickets.

Note: See TracTickets for help on using tickets.